All Topics

Technology

Art

Epicure: Multilingual Ingredient Embeddings from 4.14M Recipes Using Skip-Gram and Metapath2Vec

[Submitted on 21 May 2026]

5d ago· 1 min readenInsight

75/100

Toasty

Bagelometer↗

A good honest bake. Not flashy, but you'll finish the whole bagel.

Score75TypeanalysisSentimentneutral

Summary

Epicure is a research project that develops three sibling skip-gram ingredient embeddings trained on a multilingual recipe corpus of 4.14M recipes from 11 sources across 9 languages. The researchers normalize ingredient strings to 1,790 canonical entries using an LLM-augmented pipeline, and create two graphs: a 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph with 2,247 typed compound nodes across 15 categories. Three Metapath2Vec variants (Cooc, Chem, and Core) are trained with different random-walk schemas to explore the spectrum between chemistry-based and recipe-context-based ingredient embeddings.

Key quotes

· 3 pulled

We present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus.

We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English.

Three Metapath2Vec variants that share architecture and hyperparameters and differ only in the random-walk schema: Cooc walks the co-occurrence graph only, Chem walks the typed compound metapaths only, and Core blends both.

Snippet from the RSS feed

We present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus. We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turk

You might also wanna read

Study Finds Negative Sentiment Posts on Hacker News Receive 27% Higher Engagement

An empirical study analyzing 32,000 Hacker News posts and 340,000 comments reveals that posts with negative sentiment significantly outperfo

philippdubach.com·4mo ago

New Comprehensive Dataset Maps 15,000 Solar Arrays and 2.9 Million Panels Across the United States

Researchers from Michigan State University, NOAA, NASA, and USGS have created GM-SEUS (Ground-Mounted Solar Energy in the United States), a

tech.marksblogg.com·7mo ago

Designing Acceptance: The Challenge of Making Lab-Grown Meat Culturally Palatable

This article explores the challenges and opportunities of lab-grown meat, focusing on the role of design in making it culturally and emotion

Dezeen·8mo ago

Lumos-Nexus: A Training-Efficient Two-Stage Framework for High-Fidelity Video Generation with Reasoning Capabilities

Lumos-Nexus is a training-efficient unified video generation framework that addresses the computational challenge of integrating large high-

arxiv.org·58m ago

European XFEL achieves milestone in superconducting undulator development for next-generation X-ray lasers

European XFEL has achieved a key milestone in developing superconducting undulators for X-ray free-electron lasers. A set of superconducting

xfel.eu·1h ago

Feedback Distillation: A New Training Method for Improving LLM Reasoning in Theorem Proving

This paper introduces Feedback Distillation, a novel training method for reasoning models that improves upon standard GRPO (Group Relative P

arxiv.org·1h ago