Understanding Transformers Part 14: Calculating Encoder–Decoder Attention

1 / 5

Understanding Transformers Part 14: Calculating Encoder–Decoder Attention

DEV Community·Rijul Rajesh·about 1 month ago

#KldMqv65

#ai #machinelearning #software #coding #word #decoder

Reading 0:00

15s threshold

In the previous article , we just began introducing the concept of encoder-decoder attention. Now lets start digging into the details. Encoder–Decoder Attention in Action Just like in self-attention, we start by creating query values . In this case, we create two values to represent the query for the <EOS> token in the decoder. Next, we create key values for each word in the encoder output . Calculating Similarity Now, we calculate the similarity between the <EOS> token in the decoder and each word in the encoder. This is done using the dot product . Applying Softmax We then pass these similarity scores through a softmax function : This gives us weights that determine how much attention the decoder should pay to each input word. In this example: The first input word gets 100% attention The second word gets 0% attention This means the decoder will focus entirely on the first input word when deciding the first translated word. What’s Next?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Understanding Transformers Part 14: Calculating Encoder–Decoder Attention