LLMs read text from images now. So why ship a Machine Learning OCR model? Because the receipt your reconciliation job processed last night will be processed again next quarter, and the totals had better match. A GPT-class vision model can hallucinate a 5 into an 8 , drop a decimal, or reorder line items the second time you ask. Cloud OCR also costs money per page, leaks the document outside your network, and breaks the moment the vendor deprecates a model id. I maintain ppu-paddle-ocr , an open-source TypeScript SDK for PaddleOCR. It runs the PP-OCRv5 family directly on ONNX Runtime in Node.js, Bun, Deno, the browser, and browser extensions, with the same package and the same API. This post walks through what that buys you, how it compares against the official PaddleOCR JS SDK, Tesseract.js, and LLM OCR, and what is shipping next. Why deterministic OCR still matters in the LLM era Production pipelines need three properties that LLM OCR fights against: Reproducibility.…