Skip to main navigation Skip to search Skip to main content

A joint source-channel model for machine transliteration

Research output: Contribution to journalConference articlepeer-review

Abstract

Most foreign names are transliterated into Chinese, Japanese or Korean with approximate phonetic equivalents. The transliteration is usually achieved through intermediate phonemic mapping. This paper presents a new framework that allows direct orthographical mapping (DOM) between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM). With the n-gram TM model, we automate the orthographic alignment process to derive the aligned transliteration units from a bilingual dictionary. The n-gram TM under the DOM framework greatly reduces system development effort and provides a quantum leap in improvement in transliteration accuracy over that of other state-of-the-art machine learning algorithms. The modeling framework is validated through several experiments for English-Chinese language pair.

Original languageEnglish
Pages (from-to)159-166
Number of pages8
JournalProceedings of the Annual Meeting of the Association for Computational Linguistics
StatePublished - 2004
Externally publishedYes
Event42nd Annual Meeting of the Association for Computational Linguistics, ACL 2004 - Barcelona, Spain
Duration: 21 Jul 200426 Jul 2004

Fingerprint

Dive into the research topics of 'A joint source-channel model for machine transliteration'. Together they form a unique fingerprint.

Cite this