If you're building a product with an AI chatbot, you've probably run into this: await expect ( response ). toContainText ( ' The Pro plan costs $49/month ' ); Enter fullscreen mode Exit fullscreen mode This breaks constantly. LLMs never return the exact same string twice. The problem Traditional matchers assume deterministic output. AI responses are: Semantically equivalent but textually different every run Sometimes helpful, sometimes hallucinating Hard to validate with toEqual() or toContainText() You end up either skipping the assertion entirely, or writing brittle string checks that fail on every deploy. What I built playwright-ai-matchers — a library that uses Claude Haiku under the hood to evaluate AI responses semantically inside your Playwright tests. import { test , expect } from ' @playwright/test ' ; import ' playwright-ai-matchers ' ; test ( ' AI chatbot responds correctly to a billing question ' , async ({ page }) => { await page . goto ( ' https://your-app.com/chat ' ); await page .…