Menu

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]
📰
0

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]

Reddit r/MachineLearning·u/NoVibeCoding·about 1 month ago
#VWqzDQwS
#relu#async#float#flat#torch#article
Reading 0:00
15s threshold

A Hackable ML Compiler Stack in 5,000 Lines of Python [P] Hey r/MachineLearning, The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other. Then there's XLA, MLIR, Halide, Mojo. There is no tutorial that covers the high-level design of an ML compiler without dropping you straight into the guts of one of these frameworks. I built a reference compiler from scratch in \~5K lines of pure Python that emits raw CUDA. It takes a small model (TinyLlama, Qwen2.5-7B) and lowers it to a sequence of CUDA kernels through six IRs. The goal isn't to beat Triton; it is to build a hackable, easy-to-follow compiler. Full article: [A Principled ML Compiler Stack in 5,000 Lines of Python](https://medium.com/data-science-collective/a-principled-ml-compiler-stack-in-5-000-lines-of-python-17f2db9549d4) Repo: [deplodock](https://github.com/cloudrift-ai/deplodock) The pipeline consists of six IRs, each closer to the hardware than the last.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More