Open Access. Powered by Scholars. Published by Universities.®

Operations Research, Systems Engineering and Industrial Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Tennessee, Knoxville

Supercomputer

Other Engineering

Articles 1 - 1 of 1

Full-Text Articles in Operations Research, Systems Engineering and Industrial Engineering

Modelling Supercomputer Maintenance Interrupts: Maintenance Policy Recommendations, Jagadish Cherukuri Aug 2015

Modelling Supercomputer Maintenance Interrupts: Maintenance Policy Recommendations, Jagadish Cherukuri

Masters Theses

A supercomputer is a repairable system with large number of compute nodes interconnected to work in harmony to achieve superior computational performance. Reliability of such a complex system depends on an effective maintenance strategy that involves both emergency and preventive maintenance. This thesis analyzes the maintenance records of four supercomputers operational at The National Institute of Computational Science located at Oak Ridge National Laboratory. We propose to use the generalized proportional intensities model (GPIM) to model the maintenance interrupts as it can capture both the reliability parameters and maintenance parameters and allows the inclusion of both emergency and preventive maintenance. …