T5Gemma 2: Next-Generation Encoder-Decoder Models with Multi-Modal Capabilities

milomg

5mo ago· 5 min readenNews

75/100

Toasty

Bagelometer↗

Lightly toasted, lightly seasoned, mostly correct.

Score75TypenewsSentimentpositive

Summary

T5Gemma 2 is the next generation of encoder-decoder models based on Gemma 3, featuring multi-modal and long-context capabilities. The model introduces architectural improvements including tied word embeddings across encoder and decoder, and merged decoder self- and cross-attention to reduce parameter count. It offers compact pre-trained models at sizes of 270M-270M (~3 billion parameters) and demonstrates strong performance across various benchmarks including natural language understanding, generation, and multi-modal tasks.

Key quotes

· 3 pulled

T5Gemma 2 is the next evolution of our encoder-decoder family based on Gemma 3, featuring the first multi-modal and long-context encoder-decoder models.

Unlike T5Gemma, T5Gemma 2 adopts tied word embeddings (over encoder and decoder) and merged decoder self- and cross-attention to save model parameters.

It offers compact pre-trained models at sizes of 270M-270M (~3 billion parameters) and demonstrates strong performance across various benchmarks.

Snippet from the RSS feed

T5Gemma 2 is the next evolution of our encoder-decoder family based on Gemma 3.

You might also wanna read

TranslateGemma: Open AI Translation Models Based on Google's Gemma 3 Support 55 Languages

TranslateGemma is a new suite of open AI translation models built on Google's Gemma 3 framework, supporting 55 languages with high accuracy

Product Hunt·4mo ago

Google DeepMind Releases Gemma 4: Most Advanced Open AI Model Family

Google DeepMind has released Gemma 4, its most advanced open AI model family to date. The models feature enhanced reasoning capabilities, mu

Product Hunt·1mo ago

Google Releases Gemini Embedding 2: First Natively Multimodal Embedding Model

Google has released Gemini Embedding 2, its first natively multimodal embedding model that can map text, images, video, audio, and documents

Product Hunt·2mo ago

Google Unveils Gemini: A Multimodal AI Model to Rival GPT-4

Google's Gemini is introduced as its largest and most capable AI model, designed to be multimodal and capable of understanding and combining

Product Hunt·9mo ago