The first version of PostAll's content generation system wasn't a microservice. It was a Next.js API route that called OpenAI, wrote the result to a database, and returned a response. It took 8 seconds per article. Our first beta user queued 200 articles and the server timed out after the 11th. That's the moment you stop thinking about "how do I generate content" and start thinking about "how do I build a system that generates content." They're completely different problems. This is the architecture I landed on after three rewrites — what I chose, why I chose it, and where each approach broke before I got there. Why content generation is a bad fit for synchronous APIs Most API routes work like this: request comes in, thing happens, response goes out. The whole exchange is sub-second. Content generation breaks this model in three ways. Latency. A GPT-4o call for a 1,000-word article takes 8–15 seconds. That's not a timeout edge case — that's the normal case.…