Skip to main navigation Skip to search Skip to main content

TR-Adapter: Parameter-Efficient Transfer Learning for Video Question Answering

  • Yuanyuan Wang
  • , Meng Liu*
  • , Xuemeng Song
  • , Liqiang Nie*
  • *Corresponding author for this work
  • Shandong University
  • Shandong Jianzhu University
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

In recent years, the use of large-scale pre-trained models for vision-language tasks has gained significant attention and has shown promising results in the video question answering. However, the increasing size of these models has made the fully fine-tuning strategy impractical. Therefore, there is a growing need for research in parameter-efficient transfer learning for downstream tasks. To address this challenge, we introduce a novel parameter-efficient transfer learning technique based on a temporal reasoning adapter for the video question answering task. Our proposed approach captures the temporal relationship within videos, enabling the model to possess visual reasoning ability and knowledge acquisition ability from language models. Our extensive experiments on four video question answering datasets indicate that our method can match or even outperform fully fine-tuning strategies and state-of-the-art models, while having the advantage of parameter efficiency.

Original languageEnglish
Pages (from-to)2232-2242
Number of pages11
JournalIEEE Transactions on Multimedia
Volume27
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Video question answering
  • adapter
  • parameter-efficient transfer learning

Fingerprint

Dive into the research topics of 'TR-Adapter: Parameter-Efficient Transfer Learning for Video Question Answering'. Together they form a unique fingerprint.

Cite this