How I Let an AI Agent Run 100 ML Experiments Overnight on a $500 GPU Last week I let an AI agent run 100 machine learning experiments overnight on my RTX 3070. I woke up to a 25% model improvement. Here's exactly how it works. The Setup The agent is built on Karpathy's autoresearch concept, powered by Claude Sonnet. It runs in a loop: Propose — The agent analyzes current model performance and proposes a specific code change Implement — It writes the actual Python code to modify the neural network Train — The modified model trains on PubMed medical text data Evaluate — Loss metrics are compared against the baseline Decide — If improvement > threshold, keep the change. Otherwise, revert. Repeat — Go back to step 1 with updated context The Results Out of 100 experiments: 93 failed — proposed changes made the model worse or had no effect 7 succeeded — measurable improvements that the agent kept Net result — 25% improvement in model performance The 7% hit rate sounds low, but that's the point.…