Skip to main navigation Skip to search Skip to main content

Deep Convolutional Pooling Transformer for Deepfake Detection

  • The University of Hong Kong
  • Shandong University
  • Harbin Institute of Technology Shenzhen

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improve the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.

Original languageEnglish
Article number179
JournalACM Transactions on Multimedia Computing, Communications and Applications
Volume19
Issue number6
DOIs
StatePublished - 30 May 2023
Externally publishedYes

Keywords

  • Deepfake detection
  • image keyframes
  • transformer

Fingerprint

Dive into the research topics of 'Deep Convolutional Pooling Transformer for Deepfake Detection'. Together they form a unique fingerprint.

Cite this