Skip to main navigation Skip to search Skip to main content

Language Without Borders: A Dataset and Benchmark for Code-Switching Lip Reading

  • Xueyi Zhang
  • , Chengwei Zhang
  • , Mingrui Lao*
  • , Peng Zhao
  • , Jun Tang
  • , Yanming Guo
  • , Siqi Cai
  • , Xianghu Yue
  • , Haizhou Li
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Lip reading aims at transforming the videos of continuous lip movement into textual contents, and has achieved significant progress over the past decade. It serves as a critical yet practical assistance for speech-impaired individuals, with more practicability than speech recognition in noisy environments. With the increasing interpersonal communications in social media owing to globalization, the existing monolingual datasets for lip reading may not be sufficient to meet the exponential proliferation of bilingual and even multilingual users. However, to our best knowledge, research on code-switching is only explored in speech recognition, while the attempts in lip reading are seriously neglected. To bridge this gap, we have collected a bilingual code-switching lip reading benchmark composed of Chinese and English, dubbed CSLR. As the pioneering work, we recruited 62 speakers with proficient foundations in both spoken Chinese and English to express sentences containing both involved languages. Through rigorous criteria in data selection, CSLR benchmark has accumulated 85,560 video samples with a resolution of 1080x1920, totaling over 71.3 hours of high-quality code-switching lip movement data. To systematically evaluate the technical challenges in CSLR, we implement commonly-used lip reading backbones, as well as competitive solutions in code-switching speech for benchmark testing. Experiments show CSLR to be a challenging and under-explored lip reading task. We hope our proposed benchmark will extend the applicability of code-switching lip reading, and further contribute to the communities of cross-lingual communication and collaboration. Our dataset and benchmark are accessible at GitHub.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Externally publishedYes
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

Fingerprint

Dive into the research topics of 'Language Without Borders: A Dataset and Benchmark for Code-Switching Lip Reading'. Together they form a unique fingerprint.

Cite this