Skip to main navigation Skip to search Skip to main content

Binary-Temporal Convolutional Neural Network for Multi-Class Auditory Spatial Attention Detection

  • Peng Zhao
  • , Ruicong Wang
  • , Xueyi Zhang
  • , Mingrui Lao
  • , Siqi Cai*
  • *Corresponding author for this work
  • National University of Defense Technology
  • The Chinese University of Hong Kong, Shenzhen
  • National University of Singapore

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Humans have a remarkable ability to focus on one of the sound sources in a multi-speaker environment. Auditory spatial attention detection (ASAD) aims to identify the direction of the speech source a person is attending to based on their brain signals, with potential applications in enhancing hearing aids, improving communication systems, and advancing brain-computer interface (BCI) technologies. Most prior studies formulated the problem as binary classification, however, real-world scenarios are much more complex. Our study explores the feasibility of detecting auditory attention among 10 competing speakers. To address the needs of low-resource computing equipment, we further propose a novel approach using a binary temporal convolutional neural network (B-TCNN) for multi-class ASAD tasks. This study effectively reduces memory consumption and accelerates inference. Experimental results show that the B-TCNN achieves an average accuracy of 93.8% with only 33K parameters in 1-second decision windows for a 10-class ASAD dataset. The proposed network significantly outperforms other competitive models, offering a lightweight and efficient solution for multi-class ASAD tasks.

Original languageEnglish
Title of host publication2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
EditorsYanmin Qian, Qin Jin, Zhijian Ou, Zhenhua Ling, Zhiyong Wu, Ya Li, Lei Xie, Jianhua Tao
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages46-50
Number of pages5
ISBN (Electronic)9798331516826
DOIs
StatePublished - 2024
Externally publishedYes
Event14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024 - Beijing, China
Duration: 7 Nov 202410 Nov 2024

Publication series

Name2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

Conference

Conference14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
Country/TerritoryChina
CityBeijing
Period7/11/2410/11/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Auditory attention
  • binary neural network
  • brain-computer interface
  • cocktail party
  • electroencephalography

Fingerprint

Dive into the research topics of 'Binary-Temporal Convolutional Neural Network for Multi-Class Auditory Spatial Attention Detection'. Together they form a unique fingerprint.

Cite this