Understanding Decoder-Only Transformers Part 2: Decoder-Only vs Regular Transformers

1 / 4

Understanding Decoder-Only Transformers Part 2: Decoder-Only vs Regular Transformers

DEV Community·Rijul Rajesh·26 days ago

#UVx1uAcP

#how #ai #machinelearning #software #decoder #input

Reading 0:00

15s threshold

In this article, we will look at the differences between a decoder-only transformer and a standard (encoder–decoder) transformer . How Decoder-Only Transformers Work A decoder-only transformer uses the same components to process the input prompt and to generate the output. It relies on masked self-attention , which considers only the current word and the words that came before it . Masked self-attention is applied to both: the input prompt the generated output This means the entire process is handled by a single stack of decoder layers. How Regular Transformers Work A regular transformer has two separate parts: an encoder to process the input a decoder to generate the output When encoding the input, it uses self-attention , not masked self-attention. This allows each word to attend to all other words in the input , not just the previous ones. The decoder then uses encoder–decoder attention to stay connected to the input.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Understanding Decoder-Only Transformers Part 2: Decoder-Only vs Regular Transformers