I tracked 4,200 startup GitHub orgs for six months — here's what actually predicts a fundraise

1 / 2

I tracked 4,200 startup GitHub orgs for six months — here's what actually predicts a fundraise

DEV Community·The Data Nerd·28 days ago

#ndZxFh0K

#datascience #startup #orgs #every #signal #null

Reading 0:00

15s threshold

I started this six months ago because nobody else seemed to. Hedge funds spent the last decade extracting alpha from satellite imagery, credit-card panels, parking-lot photos. The venture-capital equivalent — public engineering activity on GitHub — was sitting in plain sight, and most institutional sourcing teams I knew still ran on Crunchbase, warm intros, and Twitter. So I built a crawler. That sentence is short. The reality wasn't. The first crawler melted my Postgres pool The first version was a Python script that hit /repos/{org}/events for every org on the list, every hour, with a single connection. It worked for 80 orgs. By the time I'd seeded 1,200 orgs into the watchlist, I was hitting GitHub's secondary rate limits inside 12 minutes and my Postgres connection pool was burning to the ground. The script was opening a new connection for every API response, and the connections weren't recycling because I'd written psycopg.connect() inside a loop instead of using a pool. Standard mistake.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

I tracked 4,200 startup GitHub orgs for six months — here's what actually predicts a fundraise