In Week 11 Tenacious-Bench, we trained a LoRA adapter on Tenacious-style B2B sales emails using Supervised Fine-Tuning (SFT). We got a real performance lift: Delta A = +0.263 (p < 0.0001). But that result exposed a harder question : Did the adapter learn how Tenacious writes, or just what repeated Tenacious-like samples looked like? This post answers that at the mechanism level: cross-entropy token-by-token, LoRA gradient flow, and why low-diversity augmentation can make convergence look better than generalization. 1) What SFT cross-entropy actually optimizes In autoregressive SFT, the model predicts the next token at each step. Cross-entropy loss measures how much probability mass the model gave the correct next token. So the objective is: not “be honest,” not “be cautious,” not “be Tenacious,” but: assign high probability to target tokens in the training distribution. If your targets consistently reflect Tenacious behavior, style improves indirectly.…