earticle

논문검색

Research on Detection Algorithm of WEB Crawler

원문정보

초록

영어

In the research of Web crawler, the most important things are structure design and solution of the key technologies. Based on the work of other people, we described the structure design of a distribute Web crawler, which including the organization of hardware and module partition of software. In this paper, one PC is utilized as the main node, and other PCs as the common nodes which are connected in LAN. The software architecture included main node design and common node design. Then, we analyzed solutions of the major techniques of the distributed Web crawler, such as how the nodes of the crawler cooperate with each other, how the task is distributed, how to keep the important Web fresh. We have proposed some practicable arithmetic to solve the problems mentioned above. Besides, we implemented a robust, distensible, customized, distributed Web crawler, and anatomized it. At last, we gave the results of two experiments, including common test and a site download test.

목차

Abstract
 1. Introduction
 2. Structural Design of Distributed Web Crawler
  2.1 URL Distribution Module
  2.2Node Communication Module
  2.3URL Analysis Module
  2.4 Download Module
  2.5 Web Analysis Module
 3. Key Technology of Web Crawler
  3.1 Selection of Seed Set
  3.2 Distributed Strategy
 4. Experiment Implementation and Evaluation
  4.1 Implementation of the System
  4.2 Realization of Distributed Task Allocation
  4.3 Realization of Single-node Downloading Tasks
  4.4 System Evaluation
 5. Conclusion
 References

저자정보

  • Hongyan Zhao Shandong Yingcai University, Jinan 250014, china

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.