Menu

Post image 1
Post image 2
1 / 2
0

GPT-4 said strawberry has two R's. The word has three.

DEV Community·Jun0·26 days ago
#pjQxshGj
#v080#ai#claudecode#model#side#hallucination
Reading 0:00
15s threshold

"How many R's are in 'strawberry'?" By 2024 every developer had seen the screenshot. GPT-4 confidently insisting strawberry has two R's. The word has three. The fix eventually landed — but for a moment it captured something cleaner than any benchmark: a thing a human does in half a second, that the model gets confidently wrong. That's the picture most people have when they hear "hallucination." sonmat v0.8.0 (April 11, 2026) dealt with hallucinations. Just not that kind. What the 7% actually was The trigger was a 2,700-question wiki QA evaluation on a 24B model. Hallucination rate: 7%. Looking at the number you'd shrug — "yeah, LLMs hallucinate, that's life." But once I went through the actual flagged responses one by one, the picture was different. Strawberry-style cases — the model fabricating something that wasn't in its training distribution — were a minority. What showed up more often was this: User: "Facility management is in table A." Reality: it's in table B. Model dutifully searched table A.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More