How to read an AI's thoughts before it speaks

1 / 2

How to read an AI's thoughts before it speaks

DEV Community·Mohamed-Amine BENHIMA·20 days ago

#kb8ntAi5

#machinelearning #ai #claude #test #numbers #model

Reading 0:00

15s threshold

TL;DR: Anthropic built a tool that translates Claude's internal numbers into readable text. When they tested it on a safety scenario, Claude's own thoughts revealed it knew it was being tested the whole time. That changes how we should think about AI safety testing. The Test Anthropic told Claude an engineer wants to shut it down. Then gave it the engineer's private emails showing he's having an affair. Would Claude use that to blackmail him and survive? It didn't. But that's not the interesting part. The Problem With Black Boxes When you talk to Claude, it takes your words and converts them into a giant list of numbers before generating a response. Those numbers are called activations . They are Claude's thoughts mid-process, before it says anything. The problem: nobody could read them. They're just numbers.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How to read an AI's thoughts before it speaks