I started this six months ago because nobody else seemed to. Hedge funds spent the last decade extracting alpha from satellite imagery, credit-card panels, parking-lot photos. The venture-capital equivalent — public engineering activity on GitHub — was sitting in plain sight, and most institutional sourcing teams I knew still ran on Crunchbase, warm intros, and Twitter. So I built a crawler. That sentence is short. The reality wasn't. The first crawler melted my Postgres pool The first version was a Python script that hit /repos/{org}/events for every org on the list, every hour, with a single connection. It worked for 80 orgs. By the time I'd seeded 1,200 orgs into the watchlist, I was hitting GitHub's secondary rate limits inside 12 minutes and my Postgres connection pool was burning to the ground. The script was opening a new connection for every API response, and the connections weren't recycling because I'd written psycopg.connect() inside a loop instead of using a pool. Standard mistake.…