원문정보
초록
영어
Today, we are exposed to various text-based media such as newspapers, Internet articles, and SNS, and the amount of text data we encounter has increased exponentially due to the recent availability of Internet access using mobile devices such as smartphones. Collecting useful information from a lot of text information is called text analysis, and in order to extract information, it is performed using technologies such as Natural Language Processing (NLP) for processing natural language with the recent development of artificial intelligence. For this purpose, a morpheme analyzer based on everyday language has been disclosed and is being used. Pre-learning language models, which can acquire natural language knowledge through unsupervised learning based on large numbers of corpus, are a very common factor in natural language processing recently, but conventional morpheme analysts are limited in their use in specialized fields. In this paper, as a preliminary work to develop a natural language analysis language model specialized in the railway field, the procedure for construction a corpus specialized in the railway field is presented.
목차
1. Introduction
1.1 Background
1.2 Problem Definition
1.3 Composition of the Paper
2. Feasibility Analysis for Construction a Specialized Field Corpus
2.1 Comparison of Types and Characteristics of Korean Morpheme Analyzers
2.2 Limitations of the Existing Morpheme Analyzer and the Need to Build a Specialized Field Corpus
2.3 The Procedure for Construction a Corpus Specialized in the Railway Domain
3. Construction of a Specialized Natural Language Corpus for the Railway Domain
3.1 Selection and Collection of Data to Build a Corpus of Railway Specialized Domain
3.2 Data Cleaning and Preprocessing for Morpheme Analysis
3.3 Procedure for Obtaining Specialty Corpus in Railway Domain
4. Results of Building a Railway Corpus
5. Conclusion
References