#HeadDimension

1 post

Feed

Images only1 of 1 post

🖼️

Chapter 10: Multi-Head Attention and the MLP Block

DEV Community·Gary Jackson·about 1 month ago

Run several attention heads in parallel on embedding slices, add a two-layer MLP for per-position computation, and assemble a transformer block.

15s