3GB of Intelligence in Your Pocket π± So someone on Instagram posted a video of an LLM running on their phone. No internet. No API calls. No cloud. Just a model sitting on the device, answering questions, understanding images, processing audio. Offline. On a phone. That's Google's new Gemma 4. Dropped April 2nd. Four model sizes, all open weight, Apache 2.0 licensed. The one that matters for this conversation is the E2B (Effective 2 Billion parameters), which is small enough to run on a phone and fast enough to actually be useful. Google built it from the same research behind Gemini 3, then squeezed it down for edge devices. The comments on the post split exactly how you'd expect. Half the replies are "that's insane π₯" and the other half are "battery go bye π". Both camps are right. Why This Is Actually A Big Deal I've been mucking about with local models for a while. Ollama on the Mac. The occasional experiment with llama.cpp on a beefy Linux box.β¦