초록 열기/닫기 버튼

This study proposes a methodology for constructing linguistic resources in order to eliminate irrelevant keywords from social media texts related to disasters such as earthquakes or typhoons. When collecting disaster-related social media texts for sentiment analysis, a large number of noisy keywords metaphorically used such as ‘pupil-earthquake = astonishment’ is observed. In this regard, filtering these linguistic noisy expressions plays a crucial role in performing an accurate text classification or sentiment analysis. In this study, two types of linguistic patterns are examined for filtering noisy expressions in natural & social disaster-related texts and a bootstrap method based on the DECO Korean electronic dictionary and Local-Grammar Graph(LGG) formalism is suggested. In this way, for 6 keywords, around 110~ 470 patterns per keyword are described in LGGs. By applying them to a new corpus through the DECO Noise-Filter platform, we obtained about 88.4% of f-measure. The methodology suggested in this study may be adopted in filtering other types of noisy expressions, which will improve the reliability of the performance of sentiment analysis of social media texts.