Abstract
The rise of AI-generated images has sparked serious concerns about their potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancements, many current approaches prioritize short-term gains at the expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models for short-term gains on a single AI image dataset, while disregarding the long-term imperative of achieving generalization and knowledge retention. To address this trade-off issue, we propose a novel learning framework (PoundNet) for the generalization of AI-generated image detection on a pre-trained vision-language model. PoundNet incorporates a learnable prompt design and a balanced objective to preserve broad knowledge from upstream tasks (object classification) while enhancing generalization for downstream tasks (AI-generated image detection). We train PoundNet on a single standard AI image dataset, following common practice in the literature. We then evaluate its performance across 10 large-scale public AI-generated image detection datasets with 5 main evaluation metrics, forming the largest benchmark test set for assessing the generalization ability of AI-generated image detection models, to our knowledge. The comprehensive benchmark evaluation demonstrates that PoundNet successfully balances generalization with knowledge retention, achieving a remarkable relative improvement of 19% in AI-generated image detection performance compared to state-of-the-art methods, while maintaining a strong performance of 63% on object classification tasks.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
| DOIs | |
| State | Accepted/In press - 2026 |
Keywords
- AI-generated image detection
- balanced objective
- generalization
- knowledge preservation
- learnable prompt design
- pre-trained vision-language model
Fingerprint
Dive into the research topics of 'Penny-Wise and Pound-Foolish in AI-Generated Image Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver