

An Experimental Study on Building A Chinese Domain-Dependent Sentiment Lexicon


Jiabin Li, Jeesun Nam

This study proposes an experimental approach to build a domain-dependent sentiment lexicon that is crucial for sentiment analysis of Chinese texts in the Deco Sentiment Analysis platform developed in Digital Language and Knowledge Contents Research Association (DICORA research center). More than 150,000 hotel review data were trained with word2vec models, and 80 emotional words were first selected as seed words. TF-IDF notion was used to measure the importance of the sentiment vocabulary in hotel reviews. In order to build the feature vector representation of each candidate word, the similarities between a term and the other 80 seed words were calculated and the upcoming step was expanding the lexicon with a double propagation method that requires recognizing feature words related to the sentiment expressions. Through the bootstrap of sentiment terms and features, the expansion of the sentiment lexicon could be performed. The evaluation of the expanded result is confirmed with rates set at 77.4% of precision and 92.6% of recall performance.


1. Introduction
2. Sentiment Lexicons
2.1. Existing Sentiment Lexicons
2.2. Domain-dependent Sentiment Lexicons
3. Properties of Chinese Hotel Review texts
3.1. Online Hotel Review Texts
3.2. Preprocessing6) Icons in Hotel Review Texts
3.3. Chinese Word Segmentation
4. Building an Initial Sentiment Lexicon
4.1. Word Vector Model and TF-IDF
4.2. Processing for Building an Initial Domain-dependent Sentiment Lexicon
4.3. Analysis of the Initial Sentiment Lexicon
5. Expanding the Initial Sentiment Lexicon
5.1. Sentiment Word Expansion and Feature Extraction
5.2. Applying Propagation Rules
6. Evaluation
6.1. Quantification of Evaluation Indicators
6.2. Performance Evaluation
7. Conclusion


  • Jiabin Li Hankuk University of Foreign Studies/Graduate Student
  • Jeesun Nam Hankuk University of Foreign Studies/Professor


