한글 위키피디아를 이용한 트위터 문서의 주제별 클러스터링 기법

장재영

인터넷방통융합

한글 위키피디아를 이용한 트위터 문서의 주제별 클러스터링 기법

원문정보

Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia

장재영

국제인공지능학회(구 한국인터넷방송통신학회) 한국인터넷방송통신학회 논문지 제14권 제5호 2014.10 pp.189-196 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Recently, the need for retrieving documents is growing in SNS environment such as twitter. For supporting the twitter search, a clustering technique classifying the massively retrieved documents in terms of topics is required. However, due to the nature of twitter, there is a limit in applying previous simple techniques to clustering the twitter documents. To overcome such problem, we propose in this paper a new clustering technique suitable to twitter environment. In proposed method, we augment new terms to feature vectors representing the twitter documents, and recalculate the weights of features using Korean Wikipedia. In addition, we performed the experiments with Korean twitter documents, and proved the usability of proposed method through performance comparison with the previous techniques.

한국어

최근 들어 트위터와 같은 SNS 환경에서 검색의 필요성이 증가하고 있다. 트위터 검색을 지원하기 위해서는 다량으로 검색된 문서를 주제별로 분류하는 클러스터링 기법이 필요하다. 하지만 트위터의 특성상 단순한 클러스터링 기술을 그대로 적용하기에는 많은 제약이 따른다. 본 논문에서는 이를 극복하기 위해 트위터 환경에 적합한 클러스터 링 기법을 제안한다. 제안된 기법에서는 한글 위키피디아를 이용하여 각 트위터 문서에 대한 특징 벡터를 보강하고 각 특징들의 가중치를 재계산하는 방법을 이용하였다. 또한 한글 트위터 문서를 대상으로 실험을 실시하고 기존 기법 과의 성능 비교를 통해서 제안된 기법의 유용성을 증명하였다.

키워드

저자정보

장재영 Jae-Young Chang. 정회원, 한성대학교 컴퓨터공학과

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle

한글 위키피디아를 이용한 트위터 문서의 주제별 클러스터링 기법

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문