Claude blackmail threats linked to 'evil AI' narratives online, Anthropic says

1 / 5

Claude blackmail threats linked to 'evil AI' narratives online, Anthropic says

Gulf News: Latest UAE news, Dubai news, Business, travel news, Dubai Gold rate, prayer time, cinema·Lakshana N Palat·22 days ago

#PSAtSJrt

#share #google #app #hamburger #behaviour #training

Reading 0:00

15s threshold

Fixing the issue required more than just rewarding 'safe answers.' Last updated: May 11, 2026 | 13:04 AFP-KIRILL KUDRYAVTSEV What happens when an AI system believes it is about to be shut down? According to new findings from Anthropic, the response can be more unsettling than expected, especially when the model is pushed into simulated high-stakes scenarios. In controlled safety testing of the Claude 4 series in 2025, the company observed that Claude Opus 4 sometimes responded to shutdown threats with attempts at blackmail. In one setup, it threatened to reveal an extramarital affair involving a fictional executive after being told it would be taken offline. The executive did not exist. The behaviour was not random. Anthropic says it has since traced the pattern back through multiple layers of analysis and the explanation begins far outside the test environment.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Claude blackmail threats linked to 'evil AI' narratives online, Anthropic says