I started an Arabic educational series explaining the Transformer architecture visually. The first episode is a high-level roadmap of the original encoder-decoder Transformer from “Attention Is All You Need.” It covers: - Encoder and Decoder - Encoder Memory - Cross-Attention - Linear + Softmax - Next Token Prediction The video is in Arabic, but the technical terms are kept in English where useful. I’d appreciate feedback on the structure and whether the visual flow is clear for beginners before going deeper into tokenization, embeddings, self-attention, and Q/K/V. Video: https://youtu.be/hPvE-ttBkn0 submitted by /u/Logical_Respect_2381 [link] [comments]