Every few weeks someone posts a victory-lap screenshot - "uploaded my bank statement to ChatGPT, got a clean spreadsheet back in 30 seconds." The spreadsheet looks pristine. Every date is formatted consistently, every merchant name is title-cased, every amount has two decimal places. The poster is happy. The replies are happy. The feature is "solved." It isn't. Those screenshots are the single most dangerous shape of LLM output for financial work, because the category of error in bank-statement extraction is almost never visible . It's not hallucinated merchants or wildly wrong dates you'd notice in review. It's one signed amount flipped on page 4 of a twenty-page statement, or a single transaction missing from a block that spanned a page break, or a Chase Zelle credit that the model confidently re-classified as a debit. You can't catch this by eye.…