What’s the Best Way to Brainwash an LLM? | Towards Data Science

1 / 11

What’s the Best Way to Brainwash an LLM? | Towards Data Science

Towards Data Science·Ferran Alia·19 days ago

#6zt6qBJV

#editorspicks #deepdives #newsletter #artificialintelligence #editorspick #model

Reading 0:00

15s threshold

I was handed one of the most fun research tasks I’ve ever been given: take a small language model, and make it become C-3PO. Not “make it play C-3PO when you ask nicely.” Make it so that C-3PO is just… who it is now. Default personality, no system prompt required. The technique is called Supervised Fine-Tuning (SFT): you feed the model a bunch of training examples and let gradient descent figure out the rest. Simple in principle. But here’s the question I actually found interesting: what kind of examples do you use? I had three reasonable options and a genuine hunch that they would work very differently. So I ran the experiment. The winner surprised me. Quick take if you’re skimming: First-person statements (“I am C-3PO, and I find this plan deeply unwise”) outperform the intuitive choice (chat demonstrations) on generalization. Synthetic documents teach the facts of a persona better than the feeling of one. A good system prompt is still underrated.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

What’s the Best Way to Brainwash an LLM? | Towards Data Science