Skip to main navigation Skip to search Skip to main content

RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking

  • Shuo Yang
  • , Yuqin Dai
  • , Guoqing Wang
  • , Xinran Zheng
  • , Jinfeng Xu
  • , Jinze Li
  • , Zhenzhe Ying
  • , Weiqiang Wang
  • , Edith C.H. Ngai
  • The University of Hong Kong
  • Ant Group
  • Tsinghua University
  • University College London

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs) hold significant potential for advancing fact-checking by leveraging their capabilities in reasoning, evidence retrieval, and explanation generation. However, existing benchmarks fail to comprehensively evaluate LLMs and Multimodal Large Language Models (MLLMs) in realistic misinformation scenarios. To bridge this gap, we introduce RealFactBench, a comprehensive benchmark designed to assess the fact-checking capabilities of LLMs and MLLMs across diverse real-world tasks, including Knowledge Validation, Rumor Detection, and Event Verification. RealFactBench consists of 6K high-quality claims drawn from authoritative sources, encompassing multimodal content and diverse domains. Our evaluation framework further introduces the Unknown Rate (UnR) metric, enabling a more nuanced assessment of models' ability to handle uncertainty and balance between over-conservatism and over-confidence. Extensive experiments on 7 representative LLMs and 4 MLLMs reveal their limitations in real-world fact-checking and offer valuable insights for further research. RealFactBench is publicly available at https://github.com/kalendsyang/RealFactBench.git.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages13435-13441
Number of pages7
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Externally publishedYes
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • fact-checking
  • large language models
  • misinformation detection

Fingerprint

Dive into the research topics of 'RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking'. Together they form a unique fingerprint.

Cite this