トピックモデルを用いた日本語テキスト マイニングの研究 -旧JLPTの読解の既出問題に対する分析を中心に-


Research on Japanese text mining using the Topic Model - Focusing on the analysis of past test reading comprehension questions in the previous format of JLPT -


피인용수 : 0(자료제공 : 네이버학술정보)



In this paper, as one of the attempts to effectively utilize the vast amount of text data, I have introduced a text mining technique called Topic Model into the field of Japanese studies. Concretely, the texts of the reading comprehension parts of the previous format JLPT for the past 20 years were collected, and Topic Model analysis was carried out. The following points were made clear by such a study. First of all, it was confirmed from actual data that the subjects of the previous format JLPT tried to avoid topic-specific biases when selecting and producing the texts for the questions. Next, the text can be statistically classified into four main topics: “Private relationships such as family and work,” “Communications related to schedules,” “Public relations related to the country and society,” and “Economic activity.” The techniques and results of topic model analysis in this paper were empirical analyzes of actual existing questions. It is considered significant in that it can be applied to all fields of Japanese studies that are needed. Of course, the discussion in this paper is limited to the texts of the previous format JLPT, not the new format JLPT, and the amount of data is relatively small, although it covers all the data for the past 20 years. In addition, a comparative analysis with other texts was not possible. Therefore, it seems that there is still room for improvement in this paper, but I would like to address this as a future issue.


1. はじめに
2. 先行研究
2.1 テキストマイニング(Text Mining)
2.2 トピックモデル(Topic Model)
3. 研究方法
3.1 データの収集
3.2 データの前処理
3.3 データの分析
4. 分析結果
4.1. LDAにおけるトピックの数の設定
4.2. LDAによるトピックモデルリング
4.3. トピック分析
5. おわりに


  • 金曘泳 김유영. 同徳女子大学 日本語学科 副教授


자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 5,500원

      0개의 논문이 장바구니에 담겼습니다.