원문정보
초록
영어
With the rapid development of Weibo, which is the most popular microblog in china, more and more attention was paid to relative studies about it. With the objective of gathering precise information data from Weibo, which is the groundwork of these researches, a novel high efficient Weibo crawler (WCrawler) based on loginning simulation is designed. The priority evaluation is described to ensure the correlation between entires. MD5 is introduced to check for duplicates of URL crawled. Experiments demonstrate that the novel crawler has an efficiency and integrity of information collecting compared with API crawler. In addition, we present a summary of the data that collected from Weibo social network by WCrawler.
목차
1. Introduction
2. Data Specification and Priority Evaluation
2.1. Data Specification
2.2. Priority Evaluation
3. Design and Implementation of WCrawler
3.1. Control Module and Storage Module
3.2. Login Simulation Module
3.3. Crawling Module
4. Experimental Results and Analysis
4.1. Performance of WCrawler
4.2. Analyzing the Crawled Data
5. Conclusions
References