An Approach of Information Extraction Based on Dom Tree and Weight Value

Haitao Wang; Shufen Liu

An Approach of Information Extraction Based on Dom Tree and Weight Value

원문정보

Haitao Wang, Shufen Liu

보안공학연구지원센터(IJGDC) International Journal of Grid and Distributed Computing Vol.9 No.10 2016.10 pp.311-320 SCOPUS

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Eliminating noisy information and extracting information content from web pages are increasing to become an important research issue in information retrieval field. In this paper, we present an approach of information extraction based on Dom tree and weight value calculation, which contains the following steps, parse the web page to construct the Dom tree, extract the title and keywords, calculate the weight value and obtain the content. The experimental result shows that this method has the higher accuracy ratio by the various themes content extraction.

키워드

Information extraction
Dom tree
Weight value
JSoup
Web pages

저자정보

Haitao Wang School of Computer Science and Technology, Henan Polytechnic University, HeNan Province, China
Shufen Liu School of Computer Science and Technology, JiLin University, JinLin Province, China

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle