SalesConversion-Bench had one uncomfortable preference-tuning mismatch: the code trained with TRL DPOTrainer , while the methodology narrative argued for SimPO. That is not just a naming issue. DPO and SimPO turn the same (prompt, chosen, rejected) pair into different update signals. If the held-out lift is small, like 22.73% vs 18.18%, the project cannot honestly claim whether the model improved because DPO was the right objective, because LoRA rank constrained the update, or because training margins improved without robust held-out behavior. The useful answer is not "DPO good, SimPO good, ORPO also good." The useful answer is: Compare the objectives under fixed conditions, control for LoRA rank, and keep the objective whose gains survive held-out evaluation instead of only improving training margins.…