Abstract
Most foreign names are transliterated into Chinese, Japanese or Korean with approximate phonetic equivalents. The transliteration is usually achieved through intermediate phonemic mapping. This paper presents a new framework that allows direct orthographical mapping (DOM) between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM). With the n-gram TM model, we automate the orthographic alignment process to derive the aligned transliteration units from a bilingual dictionary. The n-gram TM under the DOM framework greatly reduces system development effort and provides a quantum leap in improvement in transliteration accuracy over that of other state-of-the-art machine learning algorithms. The modeling framework is validated through several experiments for English-Chinese language pair.
| Original language | English |
|---|---|
| Pages (from-to) | 159-166 |
| Number of pages | 8 |
| Journal | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
| State | Published - 2004 |
| Externally published | Yes |
| Event | 42nd Annual Meeting of the Association for Computational Linguistics, ACL 2004 - Barcelona, Spain Duration: 21 Jul 2004 → 26 Jul 2004 |
Fingerprint
Dive into the research topics of 'A joint source-channel model for machine transliteration'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver