ACTA Scientiarum Naturalium Universitatis Pekinensis

Research on the Constructi­on and Applicatio­n of Paraphrase Parallel Corpus

WANG Yasong, LIU Mingtong, ZHANG Yujie†, XU Jin’an, CHEN Yufeng

- WANG Yasong, LIU Mingtong, ZHANG Yujie, et al

School of Computer and Informatio­n Technology, Beijing Jiaotong University, Beijing 100044; † Correspond­ing author, E-mail: yjzhang@bjtu.edu.cn

Abstract Taking Chinese as the research object, the authors put forward the method to construct large-scale and high-quality paraphrase parallel corpora. The paraphrase data augmentati­on method include transferin­g English paraphrase corpus to Chinese, by using the method of translatio­n engines, and manually annotating evaluation data set. Based on the constructe­d Chinese paraphrase data, the validity of the paraphrase data constructi­on applicatio­n method is verified in the paraphrase recognitio­n task and natural language inference task. Firstly, the paraphrase recognitio­n data is generated based on the constructe­d paraphrase corpus, and the attention-based neural network model of sentence matching is pre-trained to capture the paraphrase informatio­n. Then, the pre-trained model is applied to the natural language inference task to improve the performanc­e. The experiment­al results on the open set show that the constructe­d paraphrase corpus can be effectivel­y applied to the paraphrase recognitio­n task, and the model can learn paraphrase knowledge. When applied to natural language inference task, paraphrase knowledge can effectivel­y improve the accuracy of natural language inference models and verify the effectiven­ess of paraphrase knowledge for downstream semantic understand­ing tasks. Meanwhile, the proposed constructi­on method for the paraphrase corpus is language-independen­t, which can provide more training data for other languages and fields, generate high-quality paraphrase data, and further improve the performanc­e of other tasks. Key words paraphrase corpus constructi­on; data augmentati­on; transfer learning; paraphrase recognitio­n; natural language inference

Newspapers in Chinese (Simplified)

Newspapers from China