Applying Karpathy's autoresearch to a 33M-token public transit dataset (14% improvement, replication notes) [P] Hello r/MachineLearning! I work in the US transit industry and I went all-in on learning AI & ML a few months ago. When I heard about Andrej Karpathy's autoresearch framework, I thought it was really cool. I decided to use the same transit dataset from an earlier GPT-2 XL fine-tuning project to train a small 80M model from scratch. Autoresearch is designed for from-scratch pretraining (not fine-tuning) so I started a new project rather than retrofitting the GPT-2 XL one. I would love to hear from you … 1. Where did I mess up? 2. What’s interesting here? 3. What should I focus on learning? What do I do next? (I have some thoughts at end of post) # Why did I do this?…