Updated April 2026 Last week I let Cursor generate the test suite for a checkout feature I'd just shipped. It wrote 14 AI-generated tests in about 30 seconds. I was genuinely impressed. Then I read them. Twelve of the 14 covered the happy path. Some variation of "user has items, user checks out, order is created" β thorough coverage of everything that would have worked anyway. Two caught real regressions I didn't know about: one found that the "back" button after payment threw a JavaScript error, another found that discount codes broke silently with two items in the cart. Those two were worth having. What Cursor was actually doing The tests it wrote came from reading the code. It looked at the functions, inferred expected inputs and outputs, and wrote assertions that matched the implementation. That's exactly what it should do given what it can see.β¦