Skip to main navigation Skip to search Skip to main content

Aligning bilingual corpora using sentences location information

  • Li Weigang
  • , Liu Ting
  • , Wang Zhen
  • , Li Sheng
  • Harbin Institute of Technology

Research output: Contribution to conferencePaperpeer-review

Abstract

Large amounts of bilingual resource on the Internet provide us with the probability of building a large scale of bilingual corpus. The irregular characteristics of the real texts, especially without the strictly aligned paragraph boundaries, bring a challenge to alignment technology. The traditional alignment methods have some difficulties in competency for doing this. This paper describes a new method for aligning real bilingual texts using sentence pair location information. The model was motivated by the observation that the location of a sentence pair with certain length is distributed in the whole text similarly. It uses (1:1) sentence beads instead of high frequency words as the candidate anchors. The method was developed and evaluated through many different test data. The results show that it can achieve good aligned performance and be robust and language independent. It can resolve the alignment problem on real bilingual text.

Original languageEnglish
Pages141-147
Number of pages7
StatePublished - 2004
Event3rd SIGHAN Workshop on Chinese Language Processing, SIGHAN@ACL 2004 - Barcelona, Spain
Duration: 25 Jul 2004 → …

Conference

Conference3rd SIGHAN Workshop on Chinese Language Processing, SIGHAN@ACL 2004
Country/TerritorySpain
CityBarcelona
Period25/07/04 → …

Fingerprint

Dive into the research topics of 'Aligning bilingual corpora using sentences location information'. Together they form a unique fingerprint.

Cite this