Menu

FACTS Grounding: A new benchmark for evaluating the factuality of large language models
📰
0

FACTS Grounding: A new benchmark for evaluating the factuality of large language models

Google DeepMind·FACTS team·about 1 month ago
#zzKWRAEW
Reading 0:00
15s threshold

December 17, 2024 Responsibility & Safety Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations Large language models (LLMs) are transforming how we access information, yet their grip on factual accuracy remains imperfect. They can “hallucinate” false information, particularly when given complex inputs. In turn, this can erode trust in LLMs and limit their applications in the real world. Today, we’re introducing FACTS Grounding , a comprehensive benchmark for evaluating the ability of LLMs to generate responses that are not only factually accurate with respect to given inputs, but also sufficiently detailed to provide satisfactory answers to user queries. We hope our benchmark will spur industry-wide progress on factuality and grounding. To track progress, we’re also launching the FACTS leaderboard on Kaggle .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More