earticle

논문검색

Near Duplicate Document Detection using Document Image

초록

영어

With development, access of Internet has allowed storage of huge documents containing information. Identifying near duplicate documents among those documents is a major problem in information retrieval due to their dimensionality which leads to high cost time. We propose an algorithm based on tf-idf method with importance and discriminative power of a term within a single document to speed up search process for detecting how documents are similar in collection. Using only 26.6% of original document size, our method performs well on efficiency and memory usage as we have reduced compare to the original one and that leads to a decreased time in searching process for similar documents in a collection.

목차

Abstract
 1. Introduction
 2. Related Works
 3. Proposed Method
  3.1. Document Image
  3.2. Algorithm Description
  3.3. Time Complexity
 4. Experiment and Analysis
 5. Conclusion
 Acknowledgements
 References

저자정보

  • Gaudence Uwamahoro School of Information Science and Engineering, Central South University, Changsha, 410083, China
  • Zhang Zuping School of Information Science and Engineering, Central South University, Changsha, 410083, China
  • Ambele Robert Mtafya School of Information Science and Engineering, Central South University, Changsha, 410083, China
  • Weiqi Li School of Electronic and Information Engineering, Xi’an Jiaotong University Xian, 710049, China
  • Long Jun School of Information Science and Engineering, Central South University, Changsha, 410083, China

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.