FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

📰

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

Google DeepMind·The FACTS team·about 1 month ago

#google #linkedin #page #facebook #email #benchmark

Reading 0:00

15s threshold

December 9, 2025 Responsibility & Safety Large language models (LLMs) are increasingly becoming a primary source for information delivery across diverse use cases, so it’s important that their responses are factually accurate. In order to continue improving their performance on this industry-wide challenge, we have to better understand the types of use cases where models struggle to provide an accurate response and better measure factuality performance in those areas. The FACTS Benchmark Suite Today, we’re teaming up with Kaggle to introduce the FACTS Benchmark Suite . It extends our previous work developing the FACTS Grounding Benchmark , with three additional factuality benchmarks, including: A Parametric Benchmark that measures the model’s ability to access its internal knowledge accurately in factoid question use-cases. A Search Benchmark that tests a model’s ability to use Search as a tool to retrieve information and synthesize it correctly.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models