I built an npm library to test AI chatbots with Playwright — here's why normal matchers don't work

1 / 2

I built an npm library to test AI chatbots with Playwright — here's why normal matchers don't work

DEV Community·Germán Gordón·25 days ago

#VxUdjhJr

#playwright #testing #ai #response #fullscreen #await

Reading 0:00

15s threshold

If you're building a product with an AI chatbot, you've probably run into this: await expect ( response ). toContainText ( ' The Pro plan costs $49/month ' ); Enter fullscreen mode Exit fullscreen mode This breaks constantly. LLMs never return the exact same string twice. The problem Traditional matchers assume deterministic output. AI responses are: Semantically equivalent but textually different every run Sometimes helpful, sometimes hallucinating Hard to validate with toEqual() or toContainText() You end up either skipping the assertion entirely, or writing brittle string checks that fail on every deploy. What I built playwright-ai-matchers — a library that uses Claude Haiku under the hood to evaluate AI responses semantically inside your Playwright tests. import { test , expect } from ' @playwright/test ' ; import ' playwright-ai-matchers ' ; test ( ' AI chatbot responds correctly to a billing question ' , async ({ page }) => { await page . goto ( ' https://your-app.com/chat ' ); await page .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

I built an npm library to test AI chatbots with Playwright — here's why normal matchers don't work