The title is clickbait, forgive me. You clicked, time to read :D Not really sure how to start this blog post, so I'm just gonna wing it. I have 32 GB of RAM, a Ryzen 9 9950X and an RTX 3060. Not enough to run 14B in Q8 without CPU offload, and definitely not enough to let Python eat half my memory before the model is even loaded. I've been writing Lua for nearly 14 years, and I have a strong taste for tools that don't leave a huge footprint on my system: few packages, little RAM, little noise. Pretty much the opposite of the Python ecosystem. So while looking for an inference pipeline that fits these constraints, I went through the usual suspects: llama.cpp at the bottom of the stack, llama-cpp-python right above it, and then the holy grail... llama-cpp-lua ... nope. Nothing. No serious Lua wrapper, no complete pipeline, just an old abandoned POC rotting in some corner of GitHub. Strange...…