Skip to main navigation Skip to search Skip to main content

WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations

  • Haolin Deng
  • , Chang Wang
  • , Xin Li
  • , Dezhang Yuan
  • , Junlang Zhan
  • , Tianhua Zhou
  • , Jin Ma
  • , Jun Gao*
  • , Ruifeng Xu*
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Tencent
  • University of Science and Technology of China
  • Peng Cheng Laboratory
  • Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Enhancing the attribution in large language models (LLMs) is a crucial task. One feasible approach is to enable LLMs to cite external sources that support their generations. However, existing datasets and evaluation methods in this domain still exhibit notable limitations. In this work, we formulate the task of attributed query-focused summarization (AQFS) and present WebCiteS, a Chinese dataset featuring 7k human-annotated summaries with citations. WebCiteS derives from real-world user queries and web search results, offering a valuable resource for model training and evaluation. Prior works in attribution evaluation do not differentiate between groundedness errors and citation errors. They also fall short in automatically verifying sentences that draw partial support from multiple sources. We tackle these issues by developing detailed metrics and enabling the automatic evaluator to decompose the sentences into sub-claims for fine-grained verification. Our comprehensive evaluation of both open-source and proprietary models on WebCiteS highlights the challenge LLMs face in correctly citing sources, underscoring the necessity for further improvement.

Original languageEnglish
Title of host publicationLong Papers
EditorsLun-Wei Ku, Andre F. T. Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages15095-15114
Number of pages20
ISBN (Electronic)9798891760943
DOIs
StatePublished - 2024
Externally publishedYes
Event62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Bangkok, Thailand
Duration: 11 Aug 202416 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityBangkok
Period11/08/2416/08/24

Fingerprint

Dive into the research topics of 'WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations'. Together they form a unique fingerprint.

Cite this