Why Pairing Your Bootstrap Is Necessary — And When It Stops Helping

1 / 2

Why Pairing Your Bootstrap Is Necessary — And When It Stops Helping

DEV Community·Natnael Alemseged·24 days ago

#MOKh50Fo

#machinelearning #statistics #llm #trained #baseline #paired

Reading 0:00

15s threshold

A colleague's paired_bootstrap function resamples one set of 48 task indices and applies it to both the trained LoRA scores and the baseline scores. The question: what mathematical property makes that the correct procedure — and would an unpaired bootstrap have changed the reviewer-facing conclusion? The short answer: pairing is correct by experimental design . When the two score vectors have positive covariance, pairing reduces the model-based standard error; in this specific data the correlation is near-zero (r = 0.167), so the paired and unpaired bootstrap CIs are practically identical — and neither changes the reviewer-facing conclusion. Here is why, from first principles. The experimental design justification: why pairing is valid at all The 48 held-out tasks were not drawn independently for the baseline and then re-drawn independently for the trained LoRA. The same 48 tasks were evaluated under both systems.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Why Pairing Your Bootstrap Is Necessary — And When It Stops Helping