Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

2012

High-performance computing

Articles 1 - 2 of 2

Full-Text Articles in Applied Statistics

Reliability Models For Hpc Applications And A Cloud Economic Model, Thanadech Thanakornworakij Jul 2012

Reliability Models For Hpc Applications And A Cloud Economic Model, Thanadech Thanakornworakij

Doctoral Dissertations

With the enormous number of computing resources in HPC and Cloud systems, failures become a major concern. Therefore, failure behaviors such as reliability, failure rate, and mean time to failure need to be understood to manage such a large system efficiently.

This dissertation makes three major contributions in HPC and Cloud studies. First, a reliability model with correlated failures in a k-node system for HPC applications is studied. This model is extended to improve accuracy by accounting for failure correlation. Marshall-Olkin Multivariate Weibull distribution is improved by excess life, conditional Weibull, to better estimate system reliability. Also, the univariate …


A Failure Index For High Performance Computing Applications, Clayton F. Chandler Apr 2012

A Failure Index For High Performance Computing Applications, Clayton F. Chandler

Doctoral Dissertations

This dissertation introduces a new metric in the area of High Performance Computing (HPC) application reliability and performance modeling. Derived via the time-dependent implementation of an existing inequality measure, the Failure index (FI) generates a coefficient representing the level of volatility for the failures incurred by an application running on a given HPC system in a given time interval. This coefficient presents a normalized cross-system representation of the failure volatility of applications running on failure-rich HPC platforms. Further, the origin and ramifications of application failures are investigated, from which certain mathematical conclusions yield greater insight into the behavior of these …