Skip to main navigation Skip to search Skip to main content

Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Peng Cheng Laboratory
  • Shandong Jianzhu University
  • Shandong University

Research output: Contribution to journalConference articlepeer-review

Abstract

The challenge of interpreting the world from a human perspective in Artificial Intelligence (AI) is particularly evident in egocentric video question answering, which grapples with issues like small object recognition, noise suppression, and spatial-temporal reasoning.To address these challenges, we introduce the Multi-Factor Adaptive vision Selection (MFAS) framework.MFAS integrates a patch partition and merging module for enhanced small object recognition, a prior-guided patch selection module for noise suppression and focused analysis, and a hierarchical aggregation network to aggregate visual semantics guided by questions.Extensive experiments on several public egocentric datasets have validated the effectiveness and generalization of our framework.Code and data are available in https://github.com/Hyu-Zhang/EgoVideoQA.

Original languageEnglish
Pages (from-to)59310-59328
Number of pages19
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Externally publishedYes
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: 21 Jul 202427 Jul 2024

Fingerprint

Dive into the research topics of 'Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering'. Together they form a unique fingerprint.

Cite this