TLDR: Modern Android flagships can run 7B parameter models locally. Here's the threshold, the app, and the one setting that matters. The setup I tested: ROG Phone 7 Ultimate, Snapdragon 8 Gen 2, 16GB RAM. App: Off Grid. Model: Qwen 3 4B, Q4_K_M quantization. Speed: 15β30 tokens per second. Use case: lightweight workflow triggers without touching cloud tokens. RAM thresholds 6GB β 1B to 3B models. Technically works. Not practically useful for anything beyond autocomplete. 8GB + Snapdragon 8 Gen 2 β 3B to 7B models. This is the useful tier. 12GB+ β Llama 3.2 7B and Qwen 3 4B without thermal throttle issues. The app Off Grid handles NPU routing automatically on supported Snapdragon hardware. Supports Qwen 3, Llama 3.2, Gemma 3, Phi-4, and any GGUF you want to import from local storage. First thing to do after install: go to settings, switch KV cache to q4_0. That's it. Biggest single performance gain you'll get.β¦