How I Built an AI Document Ingestion Pipeline

1 / 2

How I Built an AI Document Ingestion Pipeline

DEV Community·Ryan Carter·about 1 month ago

#C1AZLSc1

#why #how #extraction #document #fullscreen #model

Reading 0:00

15s threshold

Symport is an AI document ingestion pipeline that turns a phone photo of any paper document — receipt, EOB, prescription, utility bill — into structured JSON, then stores it in Postgres with embeddings for semantic search. The full flow is: image upload → Sharp preprocessing → GPT-4o vision extraction → normalized JSON → Postgres + pgvector. I built it because I hate paper and I also lose paper. This post walks through how the pipeline actually works, including the prompt engineering decisions that make extraction reliable enough to trust and the fallback layers that keep the app useful when extraction fails. TL;DR Stack: Sharp for image preprocessing, GPT-4o for vision extraction, Prisma + Postgres + pgvector for storage and semantic search. The extraction prompt does most of the work: explicit date context to fight year hallucinations, constrained type / category enums for predictable downstream branching, and a strict "JSON only, no markdown" tail.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How I Built an AI Document Ingestion Pipeline