earticle

논문검색

Web Spam Detection Based On Link Diversity and Content Features

초록

영어

In order to get a higher ranking, spam pages deceive the search engine using cheating technology, which will disturb the users to find useful information via search engine. The web spam is designed for search engines rather than for users, so it is important to make a distinction between the normal web pages and the web spam pages. The links of the normal web pages have a wide variety of sources and the content feature of the normal web pages are distributed regularly, while links source of the web spam pages is single and the content features of them are distributed disorderly. So after analyzing the link diversity and content features distribution of the web pages, a new web page ranking algorithm was proposed in this paper. In this method, the web pages ranking score is calculated by the TrustRank method combining web pages links diversity and the web pages content features. It can be shown from the experimental results that this method can effectively reduce spam pages ranking score.

목차

Abstract
 1. Introduction
 2. Web Link and Content Features
  2.1. Link Diversity
  2.2. Content Features of Web Pages
  2.3. Ranking combining link diversity and content features
 3. Experiment and Results
  3.1. Dataset
  3.2. Measurement Standard
  3.3. Results
 4. Conclusion
 Acknowledgements
 References

저자정보

  • Xu Gongwen School of Computer Science and Technology Shandong Jianzhu University
  • Li Xiaomei Cancer Center of the Second Hospital Shandong University
  • Zhang Zhijun School of Computer Science and Technology Shandong Jianzhu University
  • Xu Li’Na School of Computer Science and Technology Shandong Jianzhu University

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.