원문정보
초록
영어
In this paper, we design a low-cost checkpointing-based rollback recovery algorithm to address the traditional scalability problem of synchronous checkpointing in the completely different point of view compared with existing ones. This algorithm enables a cluster-wide set of processes to take their semi-global checkpointing procedure while a small set of cluster heads monitor local commit of their respective administrative areas and always observe the global consistency condition. It can considerably lower communication overhead that may occur in the previous ones. This feature can enormously decrease the frequency of cluster-to-cluster communications especially in large-scale hierarchical multi-cluster systems.
목차
1. Introduction
2. New Synchronous Checkpointing-based Recovery Algorithm
2.1 Limitation of Traditional Approaches
2.2 Our Solution
3. Discussion
4. Conclusion
References