earticle

논문검색

A Focused Crawler Based on Correlation Analysis

초록

영어

With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it’s a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TF-IDF text correlation analysis. We take the seed URL as a collection entrance and fetch web pages from internet. Then analysis page information though technological like web content extraction, page link analysis technology and get the main content of one page. By the correlation analysis method based on VSM and TF-IDF text, we calculation the correlation between pages and the topics what have been defined, so we can achieve the purpose of the focus areas of the web.

목차

Abstract
 1. Introduction
 2. The Process of Focused Crawler
 3. Key Technologies
  3.1. Web Page Pretreatment
  3.2. Web Page Analysis
  3.3. The TF - IDF Text Relevancy Analysis based on the VSM
 4. Experiment
  4.1. Evaluation Index
  4.2. Experimental Process
  4.3. Results Analysis
 5. Conclusion
 References

저자정보

  • Qiuli Qin Logistic Technology and Management Lab, School of Economics and Management,Beijing Jiaotong University, Beijing 100044
  • Xin Peng Logistic Technology and Management Lab, School of Economics and Management,Beijing Jiaotong University, Beijing 100044

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.