The Day the Treasure Hunt Engine Found 700k Dead Links in 47 Minutes

1 / 3

The Day the Treasure Hunt Engine Found 700k Dead Links in 47 Minutes

DEV Community: machinelearning·Lisa Zulu·1 day ago

#deZtZEfW

#dev #stage #urls #breaker #rate #circuit

Reading 0:00

15s threshold

The Problem We Were Actually Solving It started with a single Slack alert on a Tuesday at 3:47 PM. Our in-house treasure hunt engine—basically a graph traversal service that crawled 2.8 million user-generated routes every night—began returning HTTP 410 Gone for 12% of its target URLs. That was bad because the hunt scoreboard depended on those links staying alive for 36 hours. Worse, the failures werent clustered on any single CDN; they were spread across five different hosts running in Kubernetes with identical resource limits. The on-call engineer rerouted traffic via a circuit breaker and watched the error rate spike back to 0%, but the episode revealed a latent failure mode: our engine treated a single 410 as a node failure and would detach the entire subtree, wiping out hundreds of downstream routes in one shot. That was the problem we were actually solving—eventual consistency under noisy input.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

The Day the Treasure Hunt Engine Found 700k Dead Links in 47 Minutes