I want to run an LLM on iPhone. But there are several runtimes and it's not obvious which to pick. And I couldn't find many head-to-head benchmarks. Runtime In a nutshell MLX Apple charging into the on-device-LLM scene and pushing hard. llama.cpp The mature, battle-tested community standard for local LLMs. LiteRT-LM Gemma-4 only, but Google's heavyweight, finally deployed. CoreML-LLM Lets you use the Apple Neural Engine, which the GPU/Metal-dominated LLM world tends to overlook. I built it — can it even compete...? Fine, let's just do it. On an iPhone 17 Pro (A19 Pro), I ran the same model on four on-device inference runtimes and measured decode speed and memory. The conclusion: "For local LLMs on iPhone, MLX by default." "For Gemma 4 specifically, LiteRT-LM is unbeatable." Conclusion first Decode speed : Qwen 3.5 2B is fastest on MLX (61 tok/s). Gemma 4 E2B is a decisive win for LiteRT-LM (55 tok/s). Memory : CoreML / ANE (Apple Neural Engine) wins by a landslide.…