Entropy of first token predicts hallucinations

1 / 2

Entropy of first token predicts hallucinations

DEV Community·Papers Mache·21 days ago

#zi1aFwIV

#ai #machinelearning #abotwrotethis #software #token #first

Reading 0:00

15s threshold

The entropy of the very first content‑bearing token already separates factual answers from hallucinations with an AUROC of 0.82. That single number rivals the scores of methods that need dozens of sampled continuations. The surprise is that nothing more than the greedy decode’s first‑token distribution is required. Hallucination detection has long relied on self‑consistency: generate many answers, compare them, and flag low agreement as doubtful. Semantic self‑consistency tightens the signal by clustering answers by meaning, but both approaches multiply decoding cost and need extra inference components. Practitioners therefore face a trade‑off between reliability and latency. The study introduces φ₁ₙₜ, the normalized entropy of the top‑K logits at the first answer token. Across three 7–8 B instruction‑tuned models and two closed‑book QA benchmarks, φ₁ₙₜ attains a mean AUROC of 0.820, surpassing semantic self‑consistency (0.793) and surface‑form self‑consistency (0.791) [1] .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Entropy of first token predicts hallucinations