Skip to main navigation Skip to search Skip to main content

Towards Benchmarking Situational Awareness of Large Language Models Comprehensive Benchmark, Evaluation and Analysis

  • Guo Tang
  • , Zheng Chu
  • , Wenxiang Zheng
  • , Ming Liu*
  • , Bing Qin
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Pengcheng Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Situational awareness refers to the capacity to perceive and comprehend the present context and anticipate forthcoming events, which plays a critical role in aiding decision-making, anticipating potential issues, and adapting to dynamic circumstances.Nevertheless, the situational awareness capabilities of large language models have not yet been comprehensively assessed.To address this, we propose SA-Bench, a comprehensive benchmark that covers three tiers of situational awareness capabilities, covering environment perception, situation comprehension and future projection.SA-Bench provides a comprehensive evaluation to explore the situational awareness capabilities of LLMs.We conduct extensive experiments on advanced LLMs, including GPT-4, LLaMA3, Qwen1.5, among others.Our experimental results indicate that even SOTA LLMs still exhibit substantial capability gaps compared to humans.In addition, we thoroughly analyze and examine the challenges encountered by LLMs across various tasks, as well as emphasize the deficiencies they confront.We hope SA-Bench will foster research within the field of situational awareness.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages7904-7928
Number of pages25
ISBN (Electronic)9798891761681
DOIs
StatePublished - 2024
Event2024 Findings of the Association for Computational Linguistics, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

Conference

Conference2024 Findings of the Association for Computational Linguistics, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Fingerprint

Dive into the research topics of 'Towards Benchmarking Situational Awareness of Large Language Models Comprehensive Benchmark, Evaluation and Analysis'. Together they form a unique fingerprint.

Cite this