Skip to main navigation Skip to search Skip to main content

InfoMin-based Query Embedding Optimization For Query-based Universal Sound Separation

  • Zhen Wang
  • , Jiqing Han*
  • , Liwen Zhang*
  • , Youcheng Zhang
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Academy of CASIC

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The query-based universal sound separation (QUSS) has been addressed, aiming to perform the separation of specific sound sources based on a given query. Most of existed methods focus on the improvement of separation models, ignoring the influence of category-conditioned query embedding distribution on separation performance. To address this issue, we propose an optimization method for query embedding that reduces mutual information (MI) between query embeddings while keeping task-related information intact, named the InfoMin principle. In addition, we propose the Frequency-varying Feature-wise Linear Modulation (FFiLM), which leverages frequency band differences in acoustic events to enhance the modulation capability of query embedding and improve the performance of the separation model. Experimental results show that our method achieves considerable improvements over the existing SoTA method.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6 Apr 202511 Apr 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period6/04/2511/04/25

Keywords

  • contrastive learning
  • query-based universal sound separation
  • source separation

Fingerprint

Dive into the research topics of 'InfoMin-based Query Embedding Optimization For Query-based Universal Sound Separation'. Together they form a unique fingerprint.

Cite this