Menu

Post image 1
Post image 2
1 / 2
0

Build a GST Invoice PDF Extractor in 53 Lines of Python

DEV Community·Archit Mittal·about 1 month ago
#OPfPaX6C
#automation#python#total#date#invoices#regex
Reading 0:00
15s threshold

If you run a small business in India, you've felt this pain: a folder full of PDF invoices from different vendors, every one formatted slightly differently, and somebody has to type GSTIN, invoice number, date, and total amount into a spreadsheet. I built a single-file Python script that processes the whole folder in seconds and validates GSTINs along the way. Total: 53 lines including imports and CLI plumbing. Here's the whole thing. What it does Walks a folder of PDF invoices Pulls out GSTINs (vendor + buyer) and validates the 15-character format Grabs invoice number, date, and total amount with regex Writes a clean CSV you can paste into Tally, Zoho Books, or any ledger The code # invoice_extractor.py import re , sys , csv from pathlib import Path import pdfplumber GSTIN_RE = re . compile ( r ' \b(\d{2}[A-Z]{5}\d{4}[A-Z][A-Z\d]Z[A-Z\d])\b ' ) INV_NUM_RE = re . compile ( r ' Invoice\s*(?:No\.?|Number|#)\s*[:\-]?\s*([A-Z0-9\-/]+) ' , re . I ) DATE_RE = re .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More