Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

Did My LoRA Learn Tenacious Style—or Just Memorize Augmented Patterns?

DEV Community·Beamlaka·25 days ago
#ZwtED41z
Reading 0:00
15s threshold

In Week 11 Tenacious-Bench, we trained a LoRA adapter on Tenacious-style B2B sales emails using Supervised Fine-Tuning (SFT). We got a real performance lift: Delta A = +0.263 (p < 0.0001). But that result exposed a harder question : Did the adapter learn how Tenacious writes, or just what repeated Tenacious-like samples looked like? This post answers that at the mechanism level: cross-entropy token-by-token, LoRA gradient flow, and why low-diversity augmentation can make convergence look better than generalization. 1) What SFT cross-entropy actually optimizes In autoregressive SFT, the model predicts the next token at each step. Cross-entropy loss measures how much probability mass the model gave the correct next token. So the objective is: not “be honest,” not “be cautious,” not “be Tenacious,” but: assign high probability to target tokens in the training distribution. If your targets consistently reflect Tenacious behavior, style improves indirectly.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More