One model objective, stated simply: given all previous words, predict the next word. That is the complete description of GPT's training. No labels. No human annotations. No special setup. Just take any text, hide the last word, and train the model to predict it. Then hide the last two words. Then the last three. Repeat on three hundred billion tokens of internet text. The result is a model that learns grammar, facts, reasoning, coding conventions, mathematical patterns, writing styles, and argumentation structures, not because anyone taught it these things explicitly, but because they are all necessary for predicting the next word well. When you can predict the next word with high accuracy, you have learned something deep about language and the world it describes.…