Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

1986

Computer Sciences

Distributed systems

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Distributed Recovery In Applicative Systems, Frank C. H. Lin, Robert M. Keller Aug 1986

Distributed Recovery In Applicative Systems, Frank C. H. Lin, Robert M. Keller

All HMC Faculty Publications and Research

Applicative systems are promising candidates for achieving high performance computing through aggregation of processors. This paper studies the fault recovery problems in a class of applicative systems. The concept of functional checkpointing is proposed as the nucleus of a distributed recovery mechanism. This entails incrementally building a resilient structure as the evaluation of an applicative program proceeds. A simple rollback algorithm is suggested to regenerate the corrupted structure by redoing the most effective functional checkpoints. Another algorithm, which attempts to recover intermediate results, is also presented. The parent of a faulty task reproduces a functional twin of the failed task. …