TL;DR : For RAG pipelines in 2026, pick pdfmux if you need free, local, benchmark-proven extraction with per-page confidence scoring (0.905 on opendataloader-bench, #2 overall). Pick LlamaParse if you process under 1,000 pages/day and your documents are non-sensitive — its free tier and complex-layout accuracy are hard to beat. Pick Docling if your documents are 90% tables and you want IBM-backed transformer extraction. Pick Unstructured if you ingest 25+ file formats beyond PDF and want a managed enterprise pipeline. Most teams should default to pdfmux. The 4 tools at a glance Capability pdfmux LlamaParse Docling Unstructured License MIT Closed (cloud only) MIT Apache 2.0 (OSS) / Commercial (API) Pricing $0/page $0.003/page (std) – $0.01/page (premium) $0/page $0/page (OSS) – $1/1k pages (API) Install size ~20 MB base API only (no install) ~500 MB (ML models) ~2 GB (full deps) GPU required No No (cloud-side) Optional Optional opendataloader-bench (overall) 0.905 not published 0.877 not on bench Reading…