earticle

논문검색

Tracing Similarity within Strongly Connected Components for Intelligent Web Crawling

원문정보

초록

영어

Finding and obtaining information eciently from the Web is one of the important ele- ments in realizing Smart Home environment. Users expect to nd most relevant information within the shortest possible time. In this paper, we investigate the similarity of Web pages within Strongly Connected Components (SCCs). SCCs are overlapping groups of Web pages that may imply a relationship between the Web pages of the same component. Therefore, we seek to trace the similarity of these groups of Web pages using Cosine Similarity. Our experiment performed on Malaysian Web pages indicates that Web pages within same SCC carry a common topic or theme. This nding proves that we may locate Web pages with similar topic using the hyperlinks structure, without performing expensive analysis on the contents of the Web pages.

목차

Abstract
 1: Introduction
 2: Related Work
 3: Methodology
 4: Results
  4.1: The Dataset
  4.2: Distribution of SCC Sizes
  4.3: Distribution of SCC Sizes
  4.4: Average Cosine Similarity Score
 5: Conclusion and Future Work
 References

저자정보

  • Yong-Jin Tee Faculty of Computing and Informatics Multimedia University Cyberjaya, Selangor, Malaysia
  • Lay-Ki Soon Faculty of Computing and Informatics Multimedia University Cyberjaya, Selangor, Malaysia

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.