I want a real local pipeline: image in, structured JSON out, no cloud dependency. Optimized to run Metal / ANE or whatever apple exposes ? My goal is to infer a json-struct of variables from image using FM. Sounds simple, but it ain't so as of May 2026. And I really want it. After doing a bit of research, llama.cpp provides optimization and all the necesary low level work. I just need to make swift bindings that are worth the trouble... This is a complete tutorial on how i did it. i will use something like quickbooks / wise.com receipt capture example to make it real and safe. Bon courage! What We’re Building A local inference stack with clear separation of concerns: llama.cpp as an iOS XCFramework ( vendor/llama.cpp/build-apple/llama.xcframework ) Objective-C++ bridge ( Controllers/LlamaBridge.h , Controllers/LlamaBridge.mm ) Swift-facing API in Controllers/LLMFunctionsController.swift Typed decode API: let result : ReceiptResult = try await LLMFunctionsController . shared .…