원문정보
초록
영어
Hadoop Distributed File System (HDFS) becomes a representative cloud platform, benefiting from its reliable, scalable and low-cost storage capability. However, HDFS does not present good storage and access performance when processing a huge number of small files, because massive small files bring heavy burden on NameNode of HDFS. Meanwhile, HDFS does not provide any optimization solution for storing and accessing small files, as well as no prefetching mechanism to reduce I/O operations. This paper proposes an optimized scheme, Structured Index File Merging-SIFM, using two level file indexes, the structured metadata storage, and prefetching and caching strategy, to reduce the I/O operations and improve the access efficiency. Extensive experiments demonstrate that the proposed SIFM can effectively achieve better performance in the terms of the storing and accessing for a large number of small files on HDFS, compared with native HDFS and HAR.
목차
1. Introduction
2. Related Work
3. Analysis of Small File Problem on HDFS
3.1. Architecture and Access Mechanism of HDFS
3.2. Impact on HDFS
4. Optimization Scheme -- SIFM
4.1. File Merging Strategy
4.2. Metadata Files Storage
4.3. Prefetching and Caching Files
5. Efficiency Analysis
5.1. Writing Efficiency Analysis
5.2. Reading Efficiency Analysis
6. Experiment Evaluation
6.1. Experimental Settings
6.2. Experimental Methodology
6.3. Experimental Results
7. Conclusion
Acknowledgments
References
