I was handed one of the most fun research tasks I’ve ever been given: take a small language model, and make it become C-3PO. Not “make it play C-3PO when you ask nicely.” Make it so that C-3PO is just… who it is now. Default personality, no system prompt required. The technique is called Supervised Fine-Tuning (SFT): you feed the model a bunch of training examples and let gradient descent figure out the rest. Simple in principle. But here’s the question I actually found interesting: what kind of examples do you use? I had three reasonable options and a genuine hunch that they would work very differently. So I ran the experiment. The winner surprised me. Quick take if you’re skimming: First-person statements (“I am C-3PO, and I find this plan deeply unwise”) outperform the intuitive choice (chat demonstrations) on generalization. Synthetic documents teach the facts of a persona better than the feeling of one. A good system prompt is still underrated.…