The Demo Document Is Not the Evaluation Every document extraction API looks good on the vendor's sample invoice. The problem starts when your documents arrive: a scanned supplier invoice with a faint stamp, a contract with an annex table, a delivery note photographed from a truck cab, a receipt in another language, a PDF where the text layer exists but does not match the visual reading order. If your evaluation is "upload three clean PDFs and check whether JSON comes back," you are not evaluating production behavior. You are evaluating the happy path. A useful evaluation asks a different question: can this API support the workflow you are actually shipping? That means testing accuracy, but not only accuracy. It also means testing schemas, confidence scores, source evidence, validation behavior, failure modes, cost shape, compliance fit, and what happens after extraction. Start With the Workflow, Not the Vendor Matrix Before comparing APIs, write down the workflow.…