원문정보
초록
영어
Indexing allows converting raw document collection into easily searchable representation. Bigger scale indexing poses some challenges such as how to distribute indexing computation efficiently on a cluster of nodes. MapReduce framework can be an effective tool for parallelizing such tasks as inverted index construction. We propose SciPDFindexer, distributed information retrieval system for scientific articles in PDF. For given large collection of scientific articles in PDF our system parses and extracts metadata from articles, and then indexes extracted content using our proposed scheme. Our contribution is the design of distributed IR system and indexing scheme that improve the overall indexing performance.
목차
1. Introduction
2. Background and Related Work
3. Design and Implementation of SciPDFindexer System
4. Conclusion and Future Works
Acknowledgements
References
