초록 열기/닫기 버튼

In this paper, we studied the basic humanities work that linguists can perform in the preprocessing process for natural language processing. Embedding an atypical and infinite human language into a structured and finite computer resource is a very important task that has not yet been solved, but must be solved in order to complete the natural language processing by artificial intelligence. In order to solve these tasks, humanities and linguistic knowledge must be mobilized, and engineering computational skills must be supported. In this paper, we introduced the underlying technology, focusing on the creation of the artificial intelligence synthesis corpus and the back translation, and introducing various issues for accurate meaning extraction in natural language. In the text, in particular, the problems of meaning accompaniment and categorization, morphological negation, direct visibility and implicitity, and existence were dealt with, and the introduction of Word2Vec, the concept of Subword, and the use of big data were suggested as engineering solutions. In connection with this, we proposed back translation and artificial intelligence synthesis corpus construction technologies, and explained that these technologies can play a particularly large role in natural language meaning extraction and multilingual translation. In fact, this research team conducted a Korean-English artificial intelligence translation test using the above technology, supported by NIPA high-performance computing resources. Based on a total of 4.6 million Korean-English parallel data, the data were trained by repeating 120 times for 60 days. It was found from the experimental results that the translation performance improved by about 5% when the back translation was repeated 40 times for 20 days (about 0.3 → 0.312).