Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Doctoral Dissertations

Fault tolerance

Numerical Analysis and Computation

Articles 1 - 1 of 1

Full-Text Articles in Computer Engineering

Hard And Soft Error Resilience For One-Sided Dense Linear Algebra Algorithms, Peng Du Aug 2012

Hard And Soft Error Resilience For One-Sided Dense Linear Algebra Algorithms, Peng Du

Doctoral Dissertations

Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). This dissertation develops fault tolerance algorithms for one-sided dense matrix factorizations, which handles Both hard and soft errors.

For hard errors, we propose methods based on diskless checkpointing and Algorithm Based Fault Tolerance (ABFT) to provide full matrix protection, including the left and right factor that are normally seen in …