Abstract
Attending to one's voice in a cocktail party is notably challenging, particularly for individuals with hearing impairments. This paper proposes a novel eye-controlled target speaker extraction system, which consists of an eye-tracker, face detection model, Active Speaker Detection (ASD), and Target Speaker Extraction (TSE) model. The system employs the eye-tracker to capture real-time video together with the listener's gaze. This gaze data then allows the face detection model to locate and isolate the target speaker's face within the video on a frame-by-frame basis. Using the speaker's face as the reference cue, the system can discern and separate his/her speech from a mixture of multi-talk. The experiments show that the system effectively extracts the target speaker's speech in complex auditory environments, providing both real-time performance and accuracy. A demonstration of our system is available on our website.
| Original language | English |
|---|---|
| Pages (from-to) | 380-385 |
| Number of pages | 6 |
| Journal | Proceedings of the IEEE International Conference on Cybernetics and Intelligent Systems, CIS |
| Issue number | 2024 |
| DOIs | |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 11th IEEE International Conference on Cybernetics and Intelligent Systems and 11th IEEE International Conference on Robotics, Automation and Mechatronics, CIS-RAM 2024 - Hangzhou, China Duration: 8 Aug 2024 → 11 Aug 2024 |
Keywords
- Cocktail Party
- Eye-Tracker
- Multi-modal
- Target Speaker Extraction
Fingerprint
Dive into the research topics of 'Listen to the Speaker in Your Gaze'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver