Skip to main navigation Skip to search Skip to main content

CONFLICTBANK: A Benchmark for Evaluating Knowledge Conflicts in Large Language Models

  • Zhaochen Su
  • , Jun Zhang
  • , Xiaoye Qu
  • , Tong Zhu
  • , Yanshu Li
  • , Jiashuo Sun
  • , Juntao Li
  • , Min Zhang
  • , Yu Cheng
  • Soochow University
  • Shanghai Artificial Intelligence Laboratory
  • Xiamen University
  • Chinese University of Hong Kong

Research output: Contribution to journalConference articlepeer-review

Abstract

Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. While a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge, a comprehensive assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we firstly propose CONFLICTBANK, the largest benchmark with 7.45M claim-evidence pairs and 553k QA pairs, addressing conflicts from misinformation, temporal discrepancies, and semantic divergences. Using CONFLICTBANK, we conduct the thorough and controlled experiments for a comprehensive understanding of LLM behavior in knowledge conflicts, focusing on three key aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models' encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances and provides insights into conflict types, model sizes, and the impact at different stages. We believe that knowledge conflicts represent a critical bottleneck to achieving trustworthy artificial intelligence and hope our work will offer valuable guidance for future model training and development. Resources are available at https://github.com/zhaochen0110/conflictbank.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Externally publishedYes
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

Fingerprint

Dive into the research topics of 'CONFLICTBANK: A Benchmark for Evaluating Knowledge Conflicts in Large Language Models'. Together they form a unique fingerprint.

Cite this