원문정보
초록
영어
Sentence-level aligning bilingual parallel corpus is shown significant and indispensable status in machine translation, translation knowledge acquiring and bilingual lexicography research fields, which is the fundamental work for natural language processing. Given the great deal of work in sentence alignment and a variety of methods have developed for bilingual terminology extraction, those are unpractical for newly underway Tibetan information processing because those methods have to use a large number of manufactured sentences as training corpus while extracting inter-translatable word pairs. This paper proposes a multi-strategy Tibetan-Chinese sentence alignment method based on length of sentence, syntactic rules and bilingual dictionary. We test our approach on a bilingual corpus crawled from bilingual website and perform manual evaluation on bilingual sentences pairs extracted from Tibetan-Chinese corpora.
목차
1. Introduction
2. Related Work
3. Multi-strategy Tibetan-Chinese Sentence Alignment Method
3.1 Alignment Model based on Multiple Features
3.2 Sentence Alignment based on Dictionary
4. Experiments and Results
5. Conclusion and Future Work
Acknowledgements
References