원문정보
초록
영어
In this paper, we propose a probabilistic algorithm for detecting duplicated data blocks in low bandwidth network. The algorithm identifies duplicated regions of the destination file, and only sends non-duplicated region of data. The proposed system produces two types of double Index table for a file, each chunk sizes are 4MB and 32KB, respectively. At the first level, system client detects large sized identical data blocks using 4MB chunk sized index-table by using byte-index chunking approach in rapid time. At the second level, we perform byte-index chunking using 32KB index-table on entire non-duplicated data area produced through first level file similarity detection. This gives us opportunity to more accuracy rated data deduplication and doesn’t consume so much time because deduplication work restricted by only non-duplicated area. Experiment result shows the proposed approach can reduce processing time significantly comparable to fixed-size chunking. Also data deduplication rate is as high as variable-sized chunking.
목차
1. Introduction
2. Related Works
3. System Design and Implementation
3.1. Predicting Duplicated Data of Look-Up Process
3.2. Multi-Level Byte Index Chunking
3.3. System Overview and Implementation
4. Performance Evaluation
5. Conclusion
Acknowledgements
References