원문정보
초록
영어
Time is an important dimension of information space. It plays important roles in Web search, because most Web pages contain time information and many Web queries are time-related. Therefore, exploiting temporal information in Web pages has been a hotspot in the research on Web search. In this paper, we focus on the time-enhanced topic clustering issue for news search results. Traditional clustering algorithms are usually based on the common phrases of Web pages, and they have little consideration about using the temporal information of Web pages. From this perspective, we propose a time-enhanced topic clustering algorithm for news Web pages. It improves traditional algorithms which only consider textual clustering, and applies a temporal clustering procedure on the topics returned by a textual clustering algorithm, which is to arrange every Web page in a cluster along a timeline based on the update time in Web pages. We conduct experiments on a real dataset crawled from Google News, and compare our algorithm with other competitors including K-Means, STC, TFIC, and Minhash Clustering in terms of different metrics such as precision and recall. The experimental results show that the proposed algorithm has better performance under both offline and online clustering test.
목차
1. Introduction
2. Related Work
3. Time-Enhanced Topic Clustering
3.1 Offline Clustering
3.2 Online Clustering
3.3 Time-Based Clustering
4. Performance Evaluation
4.1 Dataset
4.2 Results
5. Conclusions
Acknowledgements
References
