Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

1 / 5

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts | TechCrunch

TechCrunch·Anthony Ha·22 days ago

#pSFoKAeK

#apple #amazon #cloudcomputing #evs #google #anthropic

Reading 0:00

15s threshold

In Brief Posted: 1:40 PM PDT · May 10, 2026 Image Credits: Samuel Boivin/NurPhoto / Getty Images Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic. Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.” Apparently Anthropic has done more work around that behavior, claiming in a post on X , “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.” The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.” What accounts for the difference?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts | TechCrunch