Skip to main navigation Skip to search Skip to main content

TIMEBENCH: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

  • Zheng Chu
  • , Jingchang Chen
  • , Qianglong Chen
  • , Weijiang Yu
  • , Haotian Wang
  • , Ming Liu*
  • , Bing Qin
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Zhejiang University
  • Sun Yat-Sen University
  • Peng Cheng Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Grasping the concept of time is a fundamental facet of human cognition, indispensable for truly comprehending the intricacies of the world. Previous studies typically focus on specific aspects of time, lacking a comprehensive temporal reasoning benchmark. To address this, we propose TIMEBENCH, a comprehensive hierarchical temporal reasoning benchmark that covers a broad spectrum of temporal reasoning phenomena. TIMEBENCH provides a thorough evaluation for investigating the temporal reasoning capabilities of large language models. We conduct extensive experiments on GPT-4, LLaMA2, and other popular LLMs under various settings. Our experimental results indicate a significant performance gap between the state-of-the-art LLMs and humans, highlighting that there is still a considerable distance to cover in temporal reasoning. Besides, LLMs exhibit capability discrepancies across different reasoning categories. Furthermore, we thoroughly analyze the impact of multiple aspects on temporal reasoning and emphasize the associated challenges. We aspire for TIMEBENCH to serve as a comprehensive benchmark, fostering research in temporal reasoning.

Original languageEnglish
Title of host publicationLong Papers
EditorsLun-Wei Ku, Andre F. T. Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages1204-1228
Number of pages25
ISBN (Electronic)9798891760943
DOIs
StatePublished - 2024
Event62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Bangkok, Thailand
Duration: 11 Aug 202416 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityBangkok
Period11/08/2416/08/24

Fingerprint

Dive into the research topics of 'TIMEBENCH: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models'. Together they form a unique fingerprint.

Cite this