Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Electrical and Computer Engineering Faculty Publications

Electrical and Computer Engineering

Fault-tolerant computing

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Computer Engineering

Design And Implementation Of A Byzantine Fault Tolerance Framework For Non-Deterministic Applications, H. Zhang, Wenbing Zhao, Louise E. Moser, P. Michael Melliar-Smith Jun 2011

Design And Implementation Of A Byzantine Fault Tolerance Framework For Non-Deterministic Applications, H. Zhang, Wenbing Zhao, Louise E. Moser, P. Michael Melliar-Smith

Electrical and Computer Engineering Faculty Publications

State-machine-based replication is an effective way to increase the availability and dependability of mission-critical applications. However, all practical applications contain some degree of non-determinism. Consequently, ensuring strong replica consistency in the presence of application non-determinism has been one of the biggest challenges in building dependable distributed systems. In this Study, the authors propose a classification of common types of application non-determinism with respect to the requirement of achieving Byzantine fault tolerance (BFT), and present the design and implementation of a BFT framework that controls these types of non-determinism in a systematic manner.


Proactive Service Migration For Long-Running Byzantine Fault-Tolerant Systems, Wenbing Zhao, H. Zhang Apr 2009

Proactive Service Migration For Long-Running Byzantine Fault-Tolerant Systems, Wenbing Zhao, H. Zhang

Electrical and Computer Engineering Faculty Publications

A proactive recovery scheme based on service migration for long-running Byzantine fault-tolerant systems is described. Proactive recovery is an essential method for ensuring the long-term reliability of fault-tolerant systems that are under continuous threats from malicious adversaries. The primary benefit of our proactive recovery scheme is a reduced vulnerability window under normal operation. This is achieved in two ways. First, the time-consuming reboot step is removed from the critical path of proactive recovery. Second, the response time and the service migration latency are continuously profiled and an optimal service migration interval is dynamically determined during runtime based on the observed …