Building a Website Contact Scraper API in .NET 10: Architecture, Crawling, and Fighting Cloudflare

1 / 2

Building a Website Contact Scraper API in .NET 10: Architecture, Crawling, and Fighting Cloudflare

DEV Community·ZoktrFall·21 days ago

#tMVhXcDH

#csharp #dotnet #api #contact #fullscreen #encoded

Reading 0:00

15s threshold

Building a Website Contact Scraper API in .NET 10: Crawling, Extraction, and a Cloudflare Problem I Can't Fully Solve I built an API that takes a domain and returns emails, phones, social profiles, and company info. One call: GET /api/v1/website/contacts?domain = stripe.com Enter fullscreen mode Exit fullscreen mode Returns verified emails with confidence scores, phones, LinkedIn/Twitter/GitHub links, and crawl metadata. Here's how the interesting parts work. Architecture Clean layered architecture — Api → Application → Domain, with Infrastructure implementing the Application interfaces. The controller is 12 lines of plumbing. Everything real happens in the crawler and extractor. The Two-Phase Crawler The crawler uses a priority queue and runs in two phases. Fast path — first 18 pages, only high-value routes: /contact , /about , /privacy , /legal . Gets real contacts in under 2 seconds for most sites. Stage two — deferred URLs get promoted once the fast path finishes.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Building a Website Contact Scraper API in .NET 10: Architecture, Crawling, and Fighting Cloudflare