Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Missouri University of Science and Technology

Computer Science Faculty Research & Creative Works

Series

Fault Tolerant Computing

Publication Year

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

An Improved Characterization Of 1-Step Recoverable Embeddings: Rings In Hypercubes, Jun-Lin Liu, T.J. Sager, Bruce M. Mcmillin Jan 1994

An Improved Characterization Of 1-Step Recoverable Embeddings: Rings In Hypercubes, Jun-Lin Liu, T.J. Sager, Bruce M. Mcmillin

Computer Science Faculty Research & Creative Works

An embedding is 1-step recoverable if any single fault occurs, the embedding can be reconfigured in one reconfiguration step to maintain the structure of the embedded graph. In this paper we present an efficient scheme to construct this type of 1-step recoverable ring embeddings in the hypercube. Our scheme will guarantee finding a 1-step recoverable embedding of a length-k (even) ring in a d-cube where 6 less than or equal to k less than or equal to (3/4)2/sup d/ and d greater than or equal to 3, provided such an embedding exists. Unlike previously proposed schemes, we solve the general …


Parallel Error Tolerance Scheme Based On The Hill Climbing Nature Of Simulated Annealing, Bruce M. Mcmillin, Chul-Eui Hong Jan 1992

Parallel Error Tolerance Scheme Based On The Hill Climbing Nature Of Simulated Annealing, Bruce M. Mcmillin, Chul-Eui Hong

Computer Science Faculty Research & Creative Works

In parallelizing simulated annealing in a multicomputer, maintaining the global state S involves explicit message traffic and is a critical performance bottleneck. One way to mitigate this bottleneck is to amortize the overhead of these state updates over as many parallel state changes as possible. Using this technique introduces errors in the calculated cost C(S) of a particular state S used by the annealing process. Analytically derived bounds are placed on this error in order to assure convergence to the correct result. The resulting parallel simulated annealing algorithm dynamically changes the frequency of global updates as a function of the …


Reliable Distributed Sorting Through The Application-Oriented Fault Tolerance Paradigm, Bruce M. Mcmillin, L. M. Ni Jan 1989

Reliable Distributed Sorting Through The Application-Oriented Fault Tolerance Paradigm, Bruce M. Mcmillin, L. M. Ni

Computer Science Faculty Research & Creative Works

The design and implementation of a reliable version of the distributed bitonic sorting algorithm using the application-oriented fault tolerance paradigm on a commercial multicomputer is described. Sorting assertions in general are discussed and the bitonic sort algorithm is introduced. Faulty behavior is discussed and a fault-tolerant parallel bitonic sort developed using this paradigm is presented. The error coverage and the response of the fault-tolerant algorithm to faulty behavior are presented. Both asymptotic complexity and the results of run-time experimental measurements on an Ncube multicomputer are given. The authors demonstrate that the application-oriented fault tolerance paradigm is applicable to problems of …


Executable Assertion Development For The Distributed Parallel Environment, Bruce M. Mcmillin, L. M. Ni Jan 1988

Executable Assertion Development For The Distributed Parallel Environment, Bruce M. Mcmillin, L. M. Ni

Computer Science Faculty Research & Creative Works

The use of executable assertions is a powerful tool with which to perform program verification, provide software fault-tolerance, and provide hardware fault-tolerance via the application-oriented paradigm. The authors show that assertions commonly used in the sequential programming environment are inadequate for the distributed parallel environment. In particular, it is shown that even design-based assertions are myopic and provide inadequate error coverage. In their place, a triad of basic metrics is proposed for certain classes of problems that, when applied beginning with the specification phase of the life cycle, produce assertions that are better suited to the parallel environment. This method …