I’m Matthew, building Arbiter Briefs — an AI engine that helps founders make high-stakes decisions. This week we shipped financial PDF ingestion, and I want to walk through the architecture, the gotchas, and why we chose regex over ML for extraction. The Problem Our v1 was generating rulings based on web research + user input. But founders kept saying the same thing: “This would be way more useful if you actually read my financial data.” So we added PDF upload. But now we had a new problem: how do you reliably extract structured financial metrics from PDFs that could be formatted a hundred different ways? We could’ve gone full ML pipeline. Instead, we went pragmatic. Architecture Overview PDF Upload (multer) ↓ Storage (Railway volume) ↓ Parse (pdf-parse) ↓ Extract (regex + heuristics) ↓ Store (PostgreSQL JSONB) ↓ Use in Ruling (context injection) Simple. Async. Testable. Step 1: Upload (Multer) We use multer for file handling — it’s simple, battle-tested, and handles multipart form data without fuss.…