원문정보
초록
영어
We present a method of automatically constructing a domain-specific Korean sentiment dictionary which can be used to classify the sentiment of online movie reviews. More than 1.18 million online movie reviews with movie ratings ranging between 1 to 4 and 7 to 10 were collected across fourteen different movie genres to calculate the joint probability of a given word and the sentiment of movie reviews for each genre. In particular, the joint probability of (1) a given word and the positive movie reviews that contain movie ratings 7 to 10 and (2) a given word and the negative movie reviews that contain movie ratings 1 to 4 for each movie genre were calculated. The difference between the two joint probabilities (i.e., (1) – (2)) was obtained for each word in each genre, and the fourteen genres’ joint probability differences of each word were averaged. Finally, the averaged joint probability difference values were normalized to range between -1 and 1. These normalized values were utilized as the sentiment values of each word in the final 135,082-word movie domain Korean sentiment dictionary. The positive/negative binary sentiment classification performance of the constructed sentiment dictionary was evaluated using test data, and the balanced accuracy of 80.7% was achieved, confirming the effectiveness of the proposed sentiment dictionary construction method.
목차
1. Introduction
2. Existing Approaches
3. Sentiment Dictionary Construction
3.1. Data
3.2. Method and Assumption
3.3. Implementation
4. Experiment
4.1. Parameter Setting and Test Data
4.2. Evaluation Measure
4.3. Threshold Setting
4.4. Result
5. Conclusion
Acknowledgements
References