Chapter 7: The Training Loop and Adam Optimiser

📰

Chapter 7: The Training Loop and Adam Optimiser

DEV Community·Gary Jackson·about 1 month ago

#code #csharp #machinelearning #list #value #loss

Reading 0:00

15s threshold

What You'll Build A complete training loop that processes documents, computes loss, backpropagates gradients, and updates parameters using the Adam optimiser. Depends On All previous chapters. The Training Loop A training step is just five things in a row: Pick a document and tokenize it Forward pass for each token, building up the loss Backward pass to fill in every gradient Nudge the parameters using those gradients Zero the gradients out before the next step Step 4 is where Adam lives. Before we look at the code, it's worth slowing down on what Adam actually does and why we use it. Understanding Adam You could update parameters with simple gradient descent: p.Data -= learningRate * p.Grad . Adam is smarter in two ways. Momentum ( momentum ). Instead of reacting to each individual gradient, Adam tracks a running average of recent gradients. This smooths out noisy updates, like a rolling ball that doesn't reverse direction every time it hits a bump. Squared gradient average ( squaredGradAvg ).…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Chapter 7: The Training Loop and Adam Optimiser