Epicure: Multilingual Ingredient Embeddings from 4.14M Recipes Using Skip-Gram and Metapath2Vec
By
[Submitted on 21 May 2026]
A good honest bake. Not flashy, but you'll finish the whole bagel.
Summary
Epicure is a research project that develops three sibling skip-gram ingredient embeddings trained on a multilingual recipe corpus of 4.14M recipes from 11 sources across 9 languages. The researchers normalize ingredient strings to 1,790 canonical entries using an LLM-augmented pipeline, and create two graphs: a 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph with 2,247 typed compound nodes across 15 categories. Three Metapath2Vec variants (Cooc, Chem, and Core) are trained with different random-walk schemas to explore the spectrum between chemistry-based and recipe-context-based ingredient embeddings.
Key quotes
· 3 pulledWe present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus.
We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English.
Three Metapath2Vec variants that share architecture and hyperparameters and differ only in the random-walk schema: Cooc walks the co-occurrence graph only, Chem walks the typed compound metapaths only, and Core blends both.
You might also wanna read
Study Finds Negative Sentiment Posts on Hacker News Receive 27% Higher Engagement
An empirical study analyzing 32,000 Hacker News posts and 340,000 comments reveals that posts with negative sentiment significantly outperfo
New Comprehensive Dataset Maps 15,000 Solar Arrays and 2.9 Million Panels Across the United States
Researchers from Michigan State University, NOAA, NASA, and USGS have created GM-SEUS (Ground-Mounted Solar Energy in the United States), a
tech.marksblogg.com·7mo ago
Designing Acceptance: The Challenge of Making Lab-Grown Meat Culturally Palatable
This article explores the challenges and opportunities of lab-grown meat, focusing on the role of design in making it culturally and emotion
Lumos-Nexus: A Training-Efficient Two-Stage Framework for High-Fidelity Video Generation with Reasoning Capabilities
Lumos-Nexus is a training-efficient unified video generation framework that addresses the computational challenge of integrating large high-
European XFEL achieves milestone in superconducting undulator development for next-generation X-ray lasers
European XFEL has achieved a key milestone in developing superconducting undulators for X-ray free-electron lasers. A set of superconducting
Feedback Distillation: A New Training Method for Improving LLM Reasoning in Theorem Proving
This paper introduces Feedback Distillation, a novel training method for reasoning models that improves upon standard GRPO (Group Relative P
