Applying Karpathy's autoresearch to a 33M-token public transit dataset (14% improvement, replicat…

📰

Applying Karpathy's autoresearch to a 33M-token public transit dataset (14% improvement, replication notes) [P]

Reddit r/MachineLearning·u/MarsPassenger·about 1 month ago

#agent #autoresearch #training #train #article #discussion

Reading 0:00

15s threshold

Applying Karpathy's autoresearch to a 33M-token public transit dataset (14% improvement, replication notes) [P] Hello r/MachineLearning! I work in the US transit industry and I went all-in on learning AI & ML a few months ago. When I heard about Andrej Karpathy's autoresearch framework, I thought it was really cool. I decided to use the same transit dataset from an earlier GPT-2 XL fine-tuning project to train a small 80M model from scratch. Autoresearch is designed for from-scratch pretraining (not fine-tuning) so I started a new project rather than retrofitting the GPT-2 XL one. I would love to hear from you … 1. Where did I mess up? 2. What’s interesting here? 3. What should I focus on learning? What do I do next? (I have some thoughts at end of post) # Why did I do this?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Applying Karpathy's autoresearch to a 33M-token public transit dataset (14% improvement, replication notes) [P]