Abstract
Sleep apnea is a common sleep disorder that, if untreated, can lead to serious health issues. Snoring is a typical symptom of sleep apnea and can be utilized to develop a non-contact automatic detection method for sleep apnea severity classification (SASC). However, due to patient heterogeneity, the acoustic characteristics of snoring vary significantly among individuals. To address this issue, we introduced a text-audio multimodal model that leverages patient’s metadata to provide valuable supplementary information for SASC task. Specifically, we utilized text descriptions derived from metadata and snoring sounds to fine-tune a pretrained text-audio multimodal model. The metadata includes patient’s physical indicators such as gender, age, BMI, neck circumference, and blood pressure. We constructed a snoring dataset that included four sleep apnea severity levels. On this dataset, our method achieved a classification F-score of 74.34%. We conducted a series of ablation experiments to validate the effectiveness of improving SASC performance by leveraging both metadata-based text and snoring sounds. Additionally, we discussed the model’s performance in scenarios where parts of the metadata are unavailable, a situation that may occur in real-world applications.
| Original language | English |
|---|---|
| Journal | Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
| Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025 |
Keywords
- Metadata
- Pretrained model
- Sleep apnea severity classification
- Snoring
- Text-audio multimodal model
Fingerprint
Dive into the research topics of 'CSMT: Combining Snoring and Metadata-based Text for Sleep Apnea Severity Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver