earticle

논문검색

Design of Web Crawler Based on Improved Hidden Markov Model

초록

영어

This paper analyzes the shortcomings of traditional hidden Markov crawler, makes some improvements on the clustering strategy of web pages and the judgment algorithm for determining the correlation of pages or hyperlinks with the topic; and brings forward an AHMM (Adaptive Hidden Markov Model) modeling method. The experimental results shows that the improved AHMM is much more efficient than the traditional HMM.

목차

Abstract
 1. Introduction
 2. The Shortcomings of Traditional HMM Crawler
 3. The Overall Framework of HMM Crawler
 4. Training Set Page Clustering Strategy of AHMM Crawler
 5. The Selection Method of AHMM Crawler for the Page to be Collected
  5.1. Judgment on the Correlation Degree Between the Page and the Topic
  5.2. Judgment on the Correlation Degree Between URLs and the Topic
 6. HMM Modeling in AHMM Crawlers
 7. Implementation and Experimental Analysis
  7.1. The HMM Training
  7.2. The HMM Path Prediction
  7.3. Experimental Analysis
 8. Conclusion
 References

저자정보

  • Hailong Jia Wuhan University of Technology, Wuhan 430063, China
  • Lina Fang Xinxiang University, Xinxiang 453000, China

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.