What You'll Build Embedding tables that give each token and each position a learned vector, a minimal forward pass that produces logits, and the loss function that measures how wrong the predictions are. Depends On Chapters 1-3, 5 (Value, Tokenizer, Helpers). Embeddings: Giving Tokens an Identity The model needs two pieces of information about each token: what the token is, and where it appears in the sequence. Each piece gets its own embedding. We'll start with the first one (token embeddings) and cover position embeddings in the next section. So far, each token is just an integer: a is 0, b is 1, z is 25. A neural network can't do anything useful with a raw integer. It needs a richer representation, a list of numbers that captures something meaningful about each token. Maybe the first number captures "how often this letter starts a name" and the second captures "how vowel-like it is". We don't hand-pick these meanings. The network discovers them during training.β¦