The setup I was building an AI article generator. Four phases: Research — pull live SERP data, extract entities, score competitors Brief — Claude Sonnet writes a structured brief from the SERP Draft — section-by-section drafting in a fixed persona voice Polish — adversarial review pass that flags AI tells End-to-end: 6–8 minutes. Each phase: 60–180 seconds. Multiple LLM calls per phase. Token streaming back to the user the whole time. The first version was a single Vercel API route. It worked locally. It died in production. Why Vercel kills you at 300s Vercel Pro tops out at 300s per function. My pipeline needs 480s+. A naive await chain in a single route returns a 504 halfway through draft phase. Worse: the user closes the tab. The HTTP connection drops. The function dies. State is lost. The draft is gone. Refund issued. This is the real problem. Long-running AI workflows can't live in request–response.…