Menu

📰
0

Reddit - Please wait for verification

Learn Machine Learning·/u/LucienHugo·3 days ago
#P8f2IWnQ
Reading 0:00
15s threshold

Hi everyone, I wanted to better understand how transformers and backpropagation work internally, so I spent the last few weeks building two small projects from scratch using only Python and NumPy: ReverseGrad — a reverse-mode automatic differentiation engine. nanoGPT — a small GPT-style Transformer built on top of ReverseGrad. ReverseGrad implements a Tensor class that tracks: data grad _children _backward closures and performs a topological traversal of the computation graph during backward(). The Transformer currently includes: embeddings multi-head attention with causal masking layer normalization feed-forward layers projection layer simple optimizer text generation with temperature sampling One of the most interesting challenges was debugging memory growth during training. I discovered that parts of the computation graph were being retained through references between nodes. Working through that taught me much more about graph lifecycles and automatic differentiation than I expected.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More