Skip to main navigation Skip to search Skip to main content

Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models

  • Lizhi Lin
  • , Honglin Mu
  • , Zenan Zhai
  • , Minghan Wang
  • , Yuxia Wang
  • , Renxi Wang
  • , Junjie Gao
  • , Yixuan Zhang
  • , Wanxiang Che
  • , Timothy Baldwin
  • , Xudong Han
  • , Haonan Li
  • LibrAI
  • Tsinghua University
  • Harbin Institute of Technology
  • Oracle
  • Monash University
  • Mohamed Bin Zayed University of Artificial Intelligence
  • University of Melbourne

Research output: Contribution to journalArticlepeer-review

Abstract

Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safe use as various vulnerabilities are exposed. In light of this, the field of red teaming is undergoing fast-paced growth, highlighting the need for a comprehensive survey covering the entire pipeline and addressing emerging topics. Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models. Additionally, we have developed the “searcher” framework to unify various automatic red teaming approaches. Moreover, our survey covers novel areas including multimodal attacks and defenses, risks around LLM-based agents, overkill of harmless queries, and the balance between harmlessness and helpfulness. Warning: This paper contains examples that may be offensive, harmful, or biased.

Original languageEnglish
Pages (from-to)687-775
Number of pages89
JournalJournal of Artificial Intelligence Research
Volume82
DOIs
StatePublished - 2025

Fingerprint

Dive into the research topics of 'Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models'. Together they form a unique fingerprint.

Cite this