Near Duplicate Document Detection using Document Image

Gaudence Uwamahoro; Zhang Zuping; Ambele Robert Mtafya; Weiqi Li; Long Jun

Near Duplicate Document Detection using Document Image

원문정보

Gaudence Uwamahoro, Zhang Zuping, Ambele Robert Mtafya, Weiqi Li, Long Jun

보안공학연구지원센터(IJMUE) International Journal of Multimedia and Ubiquitous Engineering Vol.11 No.7 2016.07 pp.159-168 SCOPUS

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

With development, access of Internet has allowed storage of huge documents containing information. Identifying near duplicate documents among those documents is a major problem in information retrieval due to their dimensionality which leads to high cost time. We propose an algorithm based on tf-idf method with importance and discriminative power of a term within a single document to speed up search process for detecting how documents are similar in collection. Using only 26.6% of original document size, our method performs well on efficiency and memory usage as we have reduced compare to the original one and that leads to a decreased time in searching process for similar documents in a collection.

키워드

저자정보

Gaudence Uwamahoro School of Information Science and Engineering, Central South University, Changsha, 410083, China
Zhang Zuping School of Information Science and Engineering, Central South University, Changsha, 410083, China
Ambele Robert Mtafya School of Information Science and Engineering, Central South University, Changsha, 410083, China
Weiqi Li School of Electronic and Information Engineering, Xi’an Jiaotong University Xian, 710049, China
Long Jun School of Information Science and Engineering, Central South University, Changsha, 410083, China

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle