T5Gemma: A new collection of encoder-decoder Gemma models

1 / 5

T5Gemma: A new collection of encoder-decoder Gemma models

deepmind.google·Biao Zhang, Paul Suganthan, Ben Hora·about 1 month ago

#Pw6vvrNM

#arrow #chevron #post #menu #decoder #models

Reading 0:00

15s threshold

In the rapidly evolving landscape of large language models (LLMs), the spotlight has largely focused on the decoder-only architecture. While these models have shown impressive capabilities across a wide range of generation tasks, the classic encoder-decoder architecture, such as T5 (The Text-to-Text Transfer Transformer), remains a popular choice for many real-world applications. Encoder-decoder models often excel at summarization, translation, QA, and more due to their high inference efficiency, design flexibility, and richer encoder representation for understanding input. Nevertheless, the powerful encoder-decoder architecture has received little relative attention. Today, we revisit this architecture and introduce T5Gemma , a new collection of encoder-decoder LLMs developed by converting pretrained decoder-only models into the encoder-decoder architecture through a technique called adaptation.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

T5Gemma: A new collection of encoder-decoder Gemma models