Abstract
Video semantic role grounding has gained substantial interest from both the academic and industrial communities. While existing methods have demonstrated considerable performance improvements, the influence of noisy and intra-object proposals, referring to proposals with the same object label, has yet to be explored in video semantic role grounding. In this study, we propose a semantic-aware contrastive learning network with proposal suppression to enhance the accuracy of grounding referenced objects. To fully exploit the semantic information in each semantic role, we introduce a novel semantic role encoding module that allows for precise representations of each semantic role. We also design a semantic-aware proposal suppression network to reduce the impact of noisy proposals on object representation learning. Additionally, we propose a proposal contrastive loss to improve cross-modal alignment and reduce the effect of irrelevant intra-object proposals. Extensive experiments on four datasets demonstrate that our model achieves significant improvements over state-of-the-art methods.
| Original language | English |
|---|---|
| Pages (from-to) | 3003-3016 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 34 |
| Issue number | 4 |
| DOIs | |
| State | Published - 1 Apr 2024 |
| Externally published | Yes |
Keywords
- Video semantic role grounding
- cross-modal retrieval
- proposal contrastive learning
Fingerprint
Dive into the research topics of 'Semantic-Aware Contrastive Learning With Proposal Suppression for Video Semantic Role Grounding'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver