

A Focused Crawling Method Based on Detecting Communities in Complex Networks




The rapid growth of the large-scale World-Wide Web poses great challenge to existing focused crawling methods. Whetheranalyzing text content or link structure, traditional focused crawler were mainly based on the page granularity. Random walking in the network composed of a large number of pages, the focused crawler is easy to get lost. Obviously, narrowing the focused crawling range from the entireWeb can improve the precision and efficiency. A focused crawling method based on the twogranularitiesis put forward. Firstly, using detectingcommunity algorithm to analyze the link structure of the network composed of websites, a given topic web sites group is built up. It contributes to narrow the crawling range. Secondly, all topic relevant analysis for web pages and link prediction are performed inside this generated group. Topic relevant analysis is implemented through calculating the topic similarity for title and content separately. The similarity of father pages, anchor texts and the string text for URL all are considered to predict the topic relevance for unknown links.The experimental results suggest that this method is very effective for given topic, and it can improve the precision.


 1. Introduction
 2. Related Works
 3. Designing Focused Crawling Approach
  3.1. Framework
  3.2. Community Detection based on Website Granularity
  3.3. Content Analysis and Link Prediction Based On Webpage Granularity 
 4. Experiment 
  4.1. Dataset and Parameter Settings
  4.2. Evaluation 
  4.3. Results and Analysis
 5. Conclusion 


  • ShenGui-lan School of Information Renmin University of China, Beijing 100872, Business Collage of Beijing Union University, Beijing 100025
  • Sun Jie Business Collage of Beijing Union University, Beijing 100025
  • Yang Xiao-ping School of Information Renmin University of China, Beijing 100872


자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.