The Blank Form Lies to You A blank form looks like the easiest document in the world to automate. The labels are printed. The boxes are aligned. The field order is predictable. The template has structure. A developer can open the PDF and think the extraction problem is mostly solved before the first user touches it. Then real submissions arrive. Names overflow the boxes. Dates use local formats. Checkboxes are ticked, crossed, circled, corrected, or left half-marked. Handwriting runs into printed labels. Optional sections are partly completed. Someone scans the form at an angle. Someone else photographs it on a kitchen table with shadows across the page. The template was structured. The submitted form is not. That is the central problem with form extraction. Teams design around the clean version they control, then ship into the messy version their users create. The difference between those two documents is where most production failures live.…