earticle

논문검색

A Novel Parallel Architecture Design of Information Retrieval System for Scientific Papers

초록

영어

Indexing allows converting raw document collection into easily searchable representation. Bigger scale indexing poses some challenges such as how to distribute indexing computation efficiently on a cluster of nodes. MapReduce framework can be an effective tool for parallelizing such tasks as inverted index construction. We propose SciPDFindexer, distributed information retrieval system for scientific articles in PDF. For given large collection of scientific articles in PDF our system parses and extracts metadata from articles, and then indexes extracted content using our proposed scheme. Our contribution is the design of distributed IR system and indexing scheme that improve the overall indexing performance.

목차

Abstract
 1. Introduction
 2. Background and Related Work
 3. Design and Implementation of SciPDFindexer System
 4. Conclusion and Future Works
 Acknowledgements
 References

저자정보

  • Aziz Murtazaev Samsung Electronics, Suwon, South Korea
  • Sanggil Kang Department of Computer Science and Information Engineering, Inha University
  • Sangyoon Oh School of Information and Communication, Ajou University

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.