Menu

Post image 1
Post image 2
Post image 3
Post image 4
Post image 5
Post image 6
Post image 7
Post image 8
Post image 9
Post image 10
Post image 11
1 / 11
0

What’s the Best Way to Brainwash an LLM? | Towards Data Science

Towards Data Science·Ferran Alia·19 days ago
#6zt6qBJV
Reading 0:00
15s threshold

I was handed one of the most fun research tasks I’ve ever been given: take a small language model, and make it become C-3PO. Not “make it play C-3PO when you ask nicely.” Make it so that C-3PO is just… who it is now. Default personality, no system prompt required. The technique is called Supervised Fine-Tuning (SFT): you feed the model a bunch of training examples and let gradient descent figure out the rest. Simple in principle. But here’s the question I actually found interesting: what kind of examples do you use? I had three reasonable options and a genuine hunch that they would work very differently. So I ran the experiment. The winner surprised me. Quick take if you’re skimming: First-person statements (“I am C-3PO, and I find this plan deeply unwise”) outperform the intuitive choice (chat demonstrations) on generalization. Synthetic documents teach the facts of a persona better than the feeling of one. A good system prompt is still underrated.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More