A Hackable ML Compiler Stack in 5,000 Lines of Python [P]

📰

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]

Reddit r/MachineLearning·u/NoVibeCoding·about 1 month ago

#relu #async #float #flat #torch #article

Reading 0:00

15s threshold

A Hackable ML Compiler Stack in 5,000 Lines of Python [P] Hey r/MachineLearning, The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Triton on top of each other. Then there's XLA, MLIR, Halide, Mojo. There is no tutorial that covers the high-level design of an ML compiler without dropping you straight into the guts of one of these frameworks. I built a reference compiler from scratch in \~5K lines of pure Python that emits raw CUDA. It takes a small model (TinyLlama, Qwen2.5-7B) and lowers it to a sequence of CUDA kernels through six IRs. The goal isn't to beat Triton; it is to build a hackable, easy-to-follow compiler. Full article: [A Principled ML Compiler Stack in 5,000 Lines of Python](https://medium.com/data-science-collective/a-principled-ml-compiler-stack-in-5-000-lines-of-python-17f2db9549d4) Repo: [deplodock](https://github.com/cloudrift-ai/deplodock) The pipeline consists of six IRs, each closer to the hardware than the last.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]