Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

LSU Master's Theses

2016

Algorithmic Fault Resilience

Articles 1 - 1 of 1

Full-Text Articles in Engineering

Enhancing Program Soft Error Resilience Through Algorithmic Approaches, Sui Chen Jan 2016

Enhancing Program Soft Error Resilience Through Algorithmic Approaches, Sui Chen

LSU Master's Theses

The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resilient to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithm-specific error detection and …