If you run a small business in India, you've felt this pain: a folder full of PDF invoices from different vendors, every one formatted slightly differently, and somebody has to type GSTIN, invoice number, date, and total amount into a spreadsheet. I built a single-file Python script that processes the whole folder in seconds and validates GSTINs along the way. Total: 53 lines including imports and CLI plumbing. Here's the whole thing. What it does Walks a folder of PDF invoices Pulls out GSTINs (vendor + buyer) and validates the 15-character format Grabs invoice number, date, and total amount with regex Writes a clean CSV you can paste into Tally, Zoho Books, or any ledger The code # invoice_extractor.py import re , sys , csv from pathlib import Path import pdfplumber GSTIN_RE = re . compile ( r ' \b(\d{2}[A-Z]{5}\d{4}[A-Z][A-Z\d]Z[A-Z\d])\b ' ) INV_NUM_RE = re . compile ( r ' Invoice\s*(?:No\.?|Number|#)\s*[:\-]?\s*([A-Z0-9\-/]+) ' , re . I ) DATE_RE = re .…