원문정보
초록
영어
In order to get a higher ranking, spam pages deceive the search engine using cheating technology, which will disturb the users to find useful information via search engine. The web spam is designed for search engines rather than for users, so it is important to make a distinction between the normal web pages and the web spam pages. The links of the normal web pages have a wide variety of sources and the content feature of the normal web pages are distributed regularly, while links source of the web spam pages is single and the content features of them are distributed disorderly. So after analyzing the link diversity and content features distribution of the web pages, a new web page ranking algorithm was proposed in this paper. In this method, the web pages ranking score is calculated by the TrustRank method combining web pages links diversity and the web pages content features. It can be shown from the experimental results that this method can effectively reduce spam pages ranking score.
목차
1. Introduction
2. Web Link and Content Features
2.1. Link Diversity
2.2. Content Features of Web Pages
2.3. Ranking combining link diversity and content features
3. Experiment and Results
3.1. Dataset
3.2. Measurement Standard
3.3. Results
4. Conclusion
Acknowledgements
References