Skip to main navigation Skip to search Skip to main content

Bilingual lexicon extraction with forced correlation from comparable corpora

  • Chunyue Zhang
  • , Tiejun Zhao*
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently a simple linear transformation with word embedding has been found to be highly effective to extract a bilingual lexicon from comparable corpora. However, the pairs of bilingual word embedding for training this transformation are assumed to satisfy a linear relationship automatically which actually can’t be guaranteed absolutely in practice. This paper proposes a simple solution based on canonical correlation analysis (CCA) which forces the bilingual word embedding for training the transformation to be maximally linearly correlated onto the projection subspaces. After projecting the original word embedding into the new correlation subspace in two languages, a better transformation matrix is again learned with the new projected word embeddings as before. The experimental results confirm that the proposed solution can achieve a significant improvement of 62% in the precision at Top-1 over the baseline approach on the English-to-Chinese bilingual lexicon extraction task.

Original languageEnglish
Title of host publicationNeural Information Processing - 22nd International Conference, ICONIP 2015, Proceedings
EditorsWeng Kin Lai, Qingshan Liu, Tingwen Huang, Sabri Arik
PublisherSpringer Verlag
Pages528-535
Number of pages8
ISBN (Print)9783319265346
DOIs
StatePublished - 2015
Externally publishedYes
Event22nd International Conference on Neural Information Processing, ICONIP 2015 - Istanbul, Turkey
Duration: 9 Nov 201512 Nov 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9490
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Neural Information Processing, ICONIP 2015
Country/TerritoryTurkey
CityIstanbul
Period9/11/1512/11/15

Fingerprint

Dive into the research topics of 'Bilingual lexicon extraction with forced correlation from comparable corpora'. Together they form a unique fingerprint.

Cite this