earticle

논문검색

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

원문정보

초록

영어

Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

목차

ABSTRACT
Ⅰ. Introduction
Ⅱ. Related Work
2.1. Data Heterogeneity and Robustness
2.2. Semi-Supervised Learning
2.3. Ensemble Learning
Ⅲ. Research Methodology
3.1. Research Overview
3.2. Module 1: Heterogeneity Injection
3.3. Module 2: Classification Rule Selection
Ⅳ. Data Analysis and Results
4.1. Data Description
4.2. Data Preparation
4.3. Experiments and Results
Ⅴ. Conclusion

저자정보

  • William Xiu Shun Wong Senior Consultant, Biz Consulting Team, Datasolution Inc., Korea
  • Donghoon Lee Staff, BI LAB, Cafe24 Corp., Korea
  • Namgyu Kim Professor, School of Management Information Systems, Kookmin University, Korea

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 6,700원

      0개의 논문이 장바구니에 담겼습니다.