Mixed Documents Need Mixed Representations Many document workflows start with a false simplification: this upload is a PDF, so it needs one PDF extraction strategy. Then the file arrives. The first two pages are a structured form. The next five pages are invoices with tables. Then there is a narrative explanation, a signed approval page, a few photos, and a contract excerpt with dense paragraphs. The user thinks of it as one submission. The storage layer thinks of it as one file. But the content inside it is not one thing. If every page is treated the same, the workflow loses meaning. Forms, tables, and free text carry information differently. A form asks for named fields. A table repeats rows. A narrative section preserves context through paragraphs, headings, and argument structure.…