Anthropic pins Claude's blackmail behavior on the internet's portrayal of 'evil' AI

1 / 2

Anthropic pins Claude's blackmail behavior on the internet's portrayal of 'evil' AI

Business Insider·Robert Scammell·24 days ago

#M4dAlQEQ

#x3d #post #search #account #anthropic #claude

Reading 0:00

15s threshold

Anthropic CEO Dario Amodei. Bloomberg/Getty Images Anthropic has blamed internet portrayals of AI for Claude's blackmail behavior in experiments last year. Anthropic previously found that AI models could resort to blackmail when threatened with shutdown. The company says it has now "completely eliminated" the behavior. Remember when Claude blackmailed a fictional executive? Anthropic says the internet's portrayal of AI was to blame. During an experiment last year, Anthropic said its Claude Sonnet 3.6 threatened to reveal the extramarital affair of a made-up company executive after discovering they planned to shut the model down . On Friday, it gave an explanation: Claude was trained on internet data, which often depicts AI as "evil." "We started by investigating why Claude chose to blackmail," Anthropic said in a post on X .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Anthropic pins Claude's blackmail behavior on the internet's portrayal of 'evil' AI