Deep Learning Language Model and Chinese Grammar - Focusing on the Prediction Model of Directional Complements using BERT

본 연구에서는 딥러닝 언어모델이 중국어 방향보어를 얼마나 정확하게 예측할 수 있는지를 고찰하였다. 아울러 딥러닝 언어모델은 방향보어 추론 과정에서 어떤 단어를 중요한 단서로 사용하는지도 살펴보았다. 본 연구에 따르면 딥러닝 언어모델은 대량의 말뭉치를 통째로 학습하여 그 안에서 문법 정보를 포착하고 추론하는 능력이 상당히 뛰어나다는 것을 알 수 있다. 의미와 기능이 유사하면서도 다른 5 종류의 방향보어를 대상으로 실험한 결과를 보면 예측의 정확률이 95%를 넘어선다. 또한 확률 변화를 분석한 결과 딥러닝 언어모델은 문맥에서 방향보어를 판단하는데 연관 단어를 중요한 단서로 사용하는 것을 알 수 있다. 본 연구 방법은 자연어처리 분야에서도 의미가 있지만 중국어 문법 연구나 교육에도 많은 기여할 수 있다. 딥러닝 방법을 적절히 활용하면 중국어 문법 요소에 대한 예측 모델을 만들거나 유의어를 구분해서 확률적 분포를 보여주는 시스템을 구현할 수 있을 것이다.

In this study, we investigated how accurately the BERT model can predict Chinese directional complement. In addition, we analyzed which words the BERT model uses as an important clue in the Chinese directional complement inference process. According to the results of this study, it can be seen that the BERT model shows excellent performance in inferring distributional features and grammatical relationships based on transfer learning. Results of experiments with five Chinese directional complements show that the accuracy rate of predictions is quite high. In addition, as a result of analysis using the masked language model, it was found that the BERT model appropriately uses important clues to determine Chinese directional complement in context. We believe that this study is not only meaningful in the field of NLP, but also provides insight into Chinese grammar research or language education. If this methodology is properly utilized, it will be possible to establish an application system for Chinese grammar research and education. In Neural network models, sufficient language data learning allows us to predict which language expressions are more natural to use. Proper use of these advantages will give us insight into Chinese grammatical functions. This Chinese grammar prediction system will also help Chinese learners improve their skills by showing them what expressions are grammatically correct.