Menu

Post image 1
Post image 2
Post image 3
Post image 4
Post image 5
1 / 5
0

T5Gemma: A new collection of encoder-decoder Gemma models

deepmind.google·Biao Zhang, Paul Suganthan, Ben Hora·about 1 month ago
#Pw6vvrNM
#arrow#chevron#post#menu#decoder#models
Reading 0:00
15s threshold

In the rapidly evolving landscape of large language models (LLMs), the spotlight has largely focused on the decoder-only architecture. While these models have shown impressive capabilities across a wide range of generation tasks, the classic encoder-decoder architecture, such as T5 (The Text-to-Text Transfer Transformer), remains a popular choice for many real-world applications. Encoder-decoder models often excel at summarization, translation, QA, and more due to their high inference efficiency, design flexibility, and richer encoder representation for understanding input. Nevertheless, the powerful encoder-decoder architecture has received little relative attention. Today, we revisit this architecture and introduce T5Gemma , a new collection of encoder-decoder LLMs developed by converting pretrained decoder-only models into the encoder-decoder architecture through a technique called adaptation.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More