Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Columbus State University

Theses and Dissertations

Theses/Dissertations

2011

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Advanced I/O Techniques For Efficient And Highly Available Process Crash Recovery Protocols, Jason Cornwell Mar 2011

Advanced I/O Techniques For Efficient And Highly Available Process Crash Recovery Protocols, Jason Cornwell

Theses and Dissertations

As the number of CPU cores in high-performance computing platforms continues to grow, the availability and reliability of these systems become a primary concern. As such, some solutions are physical (ie. power backup) and some are software driven. Lawrence Berkeley National Laboratory has created a system-level fault-tolerant checkpoint/restart implementation for Linux Clusters. This allows processes to restart computations at the last known checkpoint in the event the system crashes. The checkpoint data creation is highly dependent on system input and output operations. This paper proposes: (i) a technique to improve the efficiency of these I/O operations and (ii) an alternative …