원문정보
초록
영어
Short text is a popular text form, which is widely used in real-time network news, short commentary, micro-blog and many other fields. With the development of the application such as QQ, mobile phone text messages and movie websites, the size of data is also becoming larger and larger. Most data is useless for us while other data is significant for us. Therefore, it is necessary for us to extract the useful short text from the big data. However, there are many problems with the short text classification, such as fewer features, irregularity and so on. To solve these problems, we should pretreat the short text set first, and then choose the significant features. This paper use semi-supervised learning method and SVM classifier to improve the traditional methods and it can classify a large number of short texts to mining the useful massage from the short text. The experimental results in this paper also show a good promotion.
목차
1. Introduction
2. Related Research Statuses
3. Short Text Classifications
3.1. Pretreatment of Short Text
3.2. Feature Expression
3.3. Feature Selecting Methods
3.4. Feature Weight Calculation
3.5. Support Vector Machines (SVM)
3.6. Semi-Supervised Learning
3.7. Improved Semi-Supervised Learning Algorithm
4. Experiment Result and Effect Analysis
4.1. Experimental Data
4.2. Evaluating Indicator
4.3. Experimental Results Contrast
5. Conclusion
References