Skip to main navigation Skip to search Skip to main content

A bilevel framework for joint optimization of session compensation and classification for speaker identification

  • School of Computer Science and Technology, Harbin Institute of Technology
  • School of Computer Science and Technology (School of Software), Harbin Institute of Technology Weihai
  • Harbin University of Science and Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The i-vector framework based system is one of the most popular systems in speaker identification (SID). In this system, session compensation is usually employed first and then the classifier. For any session-compensated representation of i-vector, there is a corresponding identification result, so that both the stages are related. However, in current SID systems, session compensation and classifier are usually optimized independently. An incomplete knowledge about the session compensation to the identification task may lead to involving uncertainties. In this paper, we propose a bilevel framework to jointly optimize session compensation and classifier to enhance the relationship between the two stages. In this framework, we use the sparse coding (SC) to obtain the session-compensated feature by learning an overcomplete dictionary, and employ the softmax classifier and support vector machine (SVM) in classifying respectively. Moreover, we present a joint optimization of the dictionary and classifier parameters under a discriminative criterion for classifier with conditions for SC. In addition, the proposed methods are evaluated on the King-ASR-010, VoxCeleb and RSR2015 databases. Compared with typical session compensation techniques, such as linear discriminant analysis (LDA) and nonparametric discriminant analysis (NDA), our methods can be more robust to complex session variability. Moreover, compared with the typical classifiers in i-vector framework, i.e. the cosine distance scoring (CDS) and probabilistic linear discriminant analysis (PLDA), our methods can be more suitable for SID (multiclass task).

Original languageEnglish
Pages (from-to)104-115
Number of pages12
JournalDigital Signal Processing: A Review Journal
Volume89
DOIs
StatePublished - Jun 2019
Externally publishedYes

Keywords

  • Bilevel framework
  • Joint optimization
  • Session compensation
  • Speaker identification

Fingerprint

Dive into the research topics of 'A bilevel framework for joint optimization of session compensation and classification for speaker identification'. Together they form a unique fingerprint.

Cite this