원문정보
초록
영어
In this age of data and knowledge, Cloud, Grid and P2P systems are becoming common and advanced. Due to heterogeneous and distributed nature, Grid becomes more vulnerable to faults. Trace files are great way of storing and collecting fault and workload information from the system. FTA (Fault Trace Archive) and GWA (Grid Workload Archive) are two such trace files. Previously FTA and GWA have been individually analyzed by researchers, but in this research paper for the first time, we have analyzed the combination of FTA and GWA as a single research problem. Trace files have been joined based on the event timestamp values. Both the trace files have been analyzed to establish a correlation based model among node failures, failed jobs, number of nodes and failure duration. We have discovered that these factors are positively correlated with each other but to a different extent. Along with node failure frequency, failure resume time and node dedication factor, we have found that interactive jobs have a higher failure probability as compared to batch jobs.
목차
1. Introduction
2. Related Work
3. Fault Trace Archive Analysis
3.1. Node Unavailability Time
3.2. Number of Failures
3.3. Failure Prone Nodes
3.4. Node Dedication
3.5. Average Time to Resume after Failure
4. Grid Workload Archive Analysis
4.1. Job Execution Status
4.2. Failure Stage of Jobs
4.3. Completed, Failed and Cancelled Jobs
5. Combined Analysis of FTA and GWA
5.1. Sample Size Selection
6. Correlation and Result Analysis
7. Conclusion and Future Work
References