earticle

논문검색

A Method of Collecting Mongolian Web page Based on Hyperlink Correlation Degree

초록

영어

Since the encoding of Mongolian web pages is not unified and the amount of web pages are is fewer, a method to unify linguistic model and hyperlink analysis is designed to solve the problem. Firstly the web page language identification is carried on by the N-Gram language model, as well as the average distance of language identification is a part of the hyperlink correlation degree. Secondly the hyperlink correlation degree is calculated based on the anchor text, hyperlink increasing and hyperlink depth. Finally the hyperlinks which are sorted by the hyperlink correlation degree become the collecting seeds of the next web page. The experimental results show that the method of collecting Mongolian web page based on hyperlink correlation degree can effectively enhance the information sum, collection speed and the accuracy rate.

목차

Abstract
 1. Introduction
 2. N-Gram-Based Mongolian Topic Recognition
 3. Hyperlink Correlation Degree of Node
 4. The Mongolian Web Page Collection Model
 5. Experiment Design and Result Analysis
 6. Conclusion
 References

저자정보

  • Zhiqiang Ma School of Information Engineering, Inner Mongolia University of Technology, Hohhot, China
  • Rui Yan School of Information Engineering, Inner Mongolia University of Technology, Hohhot, China
  • Zeguang Zhang School of Information Engineering, Inner Mongolia University of Technology, Hohhot, China
  • Shuangtao Yang School of Information Engineering, Inner Mongolia University of Technology, Hohhot, China

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.