Menu

Post image 1
Post image 2
1 / 2
0

Apple's On-Device Model is Terrible for Chat But Surprisingly Good at Structured Output and Tool Calling

DEV Community·Fernando Rodriguez·about 1 month ago
#CWRSNV7L
Reading 0:00
15s threshold

I've spent weeks stress-testing Apple's on-device model — the ~3B parameter one that runs on the Neural Engine of any Apple Silicon Mac. To test it thoroughly, I built Think Local , a macOS app that exercises every capability of the model: chat, image generation, structured output, tool calling, and parameter comparison. My conclusion: As a chatbot, the model is terrible. As a structured output and tool calling engine, it's surprisingly good. This distinction matters because it completely changes what you should use this model for. Chat is disappointing — and that's fine Apple's model has a 4,096-token context window. To put this in perspective: Claude has 1M tokens and GPT-4o has 128K. With Apple, add a 200-token system prompt, a 150-token schema, and three conversation turns, and you're already at 70% capacity. Free-form text quality isn't impressive either.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More