Run a Local LLM on Android: What RAM Tier You Need and Which Models Actually Work

📰

Run a Local LLM on Android: What RAM Tier You Need and Which Models Actually Work

DEV Community·EngineeredAI·about 1 month ago

#localllm #android #llm #model #models #phone

Reading 0:00

15s threshold

TLDR: Modern Android flagships can run 7B parameter models locally. Here's the threshold, the app, and the one setting that matters. The setup I tested: ROG Phone 7 Ultimate, Snapdragon 8 Gen 2, 16GB RAM. App: Off Grid. Model: Qwen 3 4B, Q4_K_M quantization. Speed: 15–30 tokens per second. Use case: lightweight workflow triggers without touching cloud tokens. RAM thresholds 6GB — 1B to 3B models. Technically works. Not practically useful for anything beyond autocomplete. 8GB + Snapdragon 8 Gen 2 — 3B to 7B models. This is the useful tier. 12GB+ — Llama 3.2 7B and Qwen 3 4B without thermal throttle issues. The app Off Grid handles NPU routing automatically on supported Snapdragon hardware. Supports Qwen 3, Llama 3.2, Gemma 3, Phi-4, and any GGUF you want to import from local storage. First thing to do after install: go to settings, switch KV cache to q4_0. That's it. Biggest single performance gain you'll get.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Run a Local LLM on Android: What RAM Tier You Need and Which Models Actually Work