Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Fault tolerance

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 22 of 22

Full-Text Articles in Computer Engineering

Improving The Efficiency Of Dnn Hardware Accelerator By Replacing Digitalfeature Extractor With An Imprecise Neuromorphic Hardware, Majid Mohammadi Rad, Omid Sojodishijani Jan 2020

Improving The Efficiency Of Dnn Hardware Accelerator By Replacing Digitalfeature Extractor With An Imprecise Neuromorphic Hardware, Majid Mohammadi Rad, Omid Sojodishijani

Turkish Journal of Electrical Engineering and Computer Sciences

Mixed-signal in-memory computation can drastically improve the efficiency of the hardware implementing machine learning (ML) algorithms by (i) removing the need to fetch neural network parameters from internal or external memory and (ii) performing a large number of multiply-accumulate operations in parallel. However, this boost in efficiency comes with some disadvantages. Among them, the inability to precisely program nonvolatile memory devices (NVM) with neural network parameters and sensitivity to noise prevent the mixed-signal hardware to perform a precise and deterministic computation. Unfortunately, these hardware-specific errors can get magnified while propagating along with the layers of the deep neural network. In …


Fault-Tolerant Method And Simulation Of Heterogeneous Multi-Core Processor Based On Speculative Mechanism, Shigan Yu, Zhimin Tang, Xiaochun Ye, Dongrui Fan Dec 2019

Fault-Tolerant Method And Simulation Of Heterogeneous Multi-Core Processor Based On Speculative Mechanism, Shigan Yu, Zhimin Tang, Xiaochun Ye, Dongrui Fan

Journal of System Simulation

Abstract: Heterogeneous multicore is one of the important branches of processors,but they are still faced with frequent transient failures. TMR(Triple mode redundancy) is the main method to solve transient faults, which has the characteristics of low efficiency and high power consumption, a high-performance Fault-Tolerant Scheduling Algorithm with Speculative mechanism(FTSAS) is proposed. Each heterogeneous core can execute tasks independently, the state values of the first completed core are recorded, and the first completed core continues to perform the next task with forward speculative method. The results are compared by backward core, the majority consensus principle is adopted to ensure the reliability …


Flexibility Of Remediation Methods For Winding Open Circuit Faults In A Multiphase Pm Machine Considering Iron Losses Minimization, Fan Wu, Ayman M. El-Refaie Nov 2019

Flexibility Of Remediation Methods For Winding Open Circuit Faults In A Multiphase Pm Machine Considering Iron Losses Minimization, Fan Wu, Ayman M. El-Refaie

Electrical and Computer Engineering Faculty Research and Publications

The flexibility of post-fault control in multiphase machine systems stems from their multiple degrees of freedom. A post-fault loss-minimization method is proposed and investigated in this paper, in which both the machine copper and iron losses are considered during the derivation of post-fault remediation methods. Therefore, machine efficiency during post-fault operation can be further improved compared to the conventional stator-ohmic-loss-minimization approach. In addition, the combination of three key factors/constraints that can influence the post-fault control strategy of a six-phase permanent magnet (PM) machine has been investigated. By comparing four selected remediation methods based on three constraints, the pros and cons …


High Performance Fault-Tolerant Scheduling Method And Simulation For Heterogeneous Multicore, Shigan Yu, Zhimin Tang, Xiaochun Ye, Zhimin Zhang Jan 2019

High Performance Fault-Tolerant Scheduling Method And Simulation For Heterogeneous Multicore, Shigan Yu, Zhimin Tang, Xiaochun Ye, Zhimin Zhang

Journal of System Simulation

Abstract: To deal with the problem of low performance and high power consumption when solving the transient fault of the processor with three mode redundancy (TMR), a task execution algorithm for heterogeneous multicore considering fault tolerant (TEAHFT) is proposed. The tasks to be executed are divided into sensitive tasks and fault-tolerant tasks. The sensitive tasks are executed in a TMR mode, and the fault-tolerant tasks are executed in a competitive scheduling mode. The task will be rerun in TMR method if the results of the fault-tolerant tasks do not meet the reliability threshold. The simulation experimental results show that TEAHFT …


Tapu: Test And Pick Up-Based $K$-Connectivity Restoration Algorithm For Wireless Sensor Networks, Vahi̇d Khali̇lpour Akram, Orhan Dağdevi̇ren Jan 2019

Tapu: Test And Pick Up-Based $K$-Connectivity Restoration Algorithm For Wireless Sensor Networks, Vahi̇d Khali̇lpour Akram, Orhan Dağdevi̇ren

Turkish Journal of Electrical Engineering and Computer Sciences

A $k$-connected wireless sensor network remains connected if any $k$-1 arbitrary nodes stop working. The aim of movement-assisted $k$-connectivity restoration is to preserve the $k$-connectivity of a network by moving the nodes to the necessary positions after possible failures in nodes. This paper proposes an algorithm named TAPU for $k$-connectivity restoration that guarantees the optimal movement cost. Our algorithm improves the time and space complexities of the previous approach (MCCR) in both best and worst cases. In the proposed algorithm, the nodes are classified into safe and unsafe groups. Failures of safe nodes do not change the $k$ value of …


Modular Injection System And Sampling Template (M.I.S.S.T) Design Report, Froylan M. Aguirre Jun 2018

Modular Injection System And Sampling Template (M.I.S.S.T) Design Report, Froylan M. Aguirre

Computer Engineering

Digital systems are ubiquitous throughout modern life and their applications continue to grow. Thus system designers engineer and test modular systems to mitigate error rates. Smaller systems and their increasing importance in many applications demand the utmost reliability. Fault injection is the most common method used by researchers and engineers to test system reliability. However, most hardware fault injection implementations are ad hoc and only used to test a specific system or for specific tests. There is also software-implemented fault injection that adds overhead in the benchmark source code. The aim of this project is to develop a general use, …


Virtualization In Wireless Sensor Networks: Fault Tolerant Embedding For Internet Of Things, Omprakash Kaiwartya, Abdul Hanan Abdullah, Yue Cao, Jaime Lloret, Sushil Kumar, Rajiv Ratn Shah, Mukesh Prasad, Shiv Prakash Apr 2018

Virtualization In Wireless Sensor Networks: Fault Tolerant Embedding For Internet Of Things, Omprakash Kaiwartya, Abdul Hanan Abdullah, Yue Cao, Jaime Lloret, Sushil Kumar, Rajiv Ratn Shah, Mukesh Prasad, Shiv Prakash

Research Collection School Of Computing and Information Systems

Recently, virtualization in wireless sensor networks (WSNs) has witnessed significant attention due to the growing service domain for IoT. Related literature on virtualization in WSNs explored resource optimization without considering communication failure in WSNs environments. The failure of a communication link in WSNs impacts many virtual networks running IoT services. In this context, this paper proposes a framework for optimizing fault tolerance in virtualization in WSNs, focusing on heterogeneous networks for service-oriented IoT applications. An optimization problem is formulated considering fault tolerance and communication delay as two conflicting objectives. An adapted non-dominated sorting based genetic algorithm (A-NSGA) is developed to …


Improving Large Scale Application Performance Via Data Movement Reduction, Dewan M. Ibtesham Nov 2017

Improving Large Scale Application Performance Via Data Movement Reduction, Dewan M. Ibtesham

Computer Science ETDs

The compute capacity growth in high performance computing (HPC) systems is outperforming improvements in other areas of the system for example, memory capacity, network bandwidth and I/O bandwidth. Therefore, the cost of executing a floating point operation is decreasing at a faster rate than moving that data. This increasing performance gap causes wasted CPU cycles while waiting for slower I/O operations to complete in the memory hierarchy, network, and storage. These bottlenecks decrease application time to solution performance, and increase energy consumption, resulting in system under utilization. In other words, data movement is becoming a key concern for future HPC …


A Fault-Tolerant T-Type Multilevel Inverter Topology With Increased Overload Capability And Soft-Switching Characteristics, Jiangbiao He, Ramin Katebi, Nathan Weise, Nabeel Demerdash, Lixiang Wei Feb 2017

A Fault-Tolerant T-Type Multilevel Inverter Topology With Increased Overload Capability And Soft-Switching Characteristics, Jiangbiao He, Ramin Katebi, Nathan Weise, Nabeel Demerdash, Lixiang Wei

Electrical and Computer Engineering Faculty Research and Publications

he performance of a novel three-phase four-leg fault-tolerant T-type inverter topology is introduced in this paper. This inverter topology provides a fault-tolerant solution to any open-circuit and certain short-circuit switching faults in the power devices. During any of the fault-tolerant operation modes for these device faults, there is no derating required in the inverter output voltage or output power. In addition, overload capability is increased in this new T-type inverter compared to that in the conventional three-level T-type inverter. Such increase in inverter overload capability is due to the utilization of the redundant leg for overload current sharing with other …


A Centralized Self-Adaptive Fault Tolerance Approach Based On Feedback Control For Multiagent Systems, Şebnem Bora, Oğuz Di̇kenelli̇ Jan 2016

A Centralized Self-Adaptive Fault Tolerance Approach Based On Feedback Control For Multiagent Systems, Şebnem Bora, Oğuz Di̇kenelli̇

Turkish Journal of Electrical Engineering and Computer Sciences

Our research introduces a self-adaptive fault tolerance approach for multiagent systems that enables the system to avoid crash failures. It is a replication-based approach that exploits a feedback control loop and a proportional (P) controller within a replication infrastructure. Thus, we are able to both observe the agents' behaviors to estimate criticalities and determine the number of replicas in replica groups with respect to the agents' criticalities and the number of available resources. Thus, agents that are to be replicated and the numbers of replicas in replica groups are automatically and adaptively identified in dynamic environments. We implement this approach …


Load Management In A Distributed Multimedia Streaming Environment Using A Fault-Tolerant Hierarchical System, Hadi̇ Işik Aybay, Mohammad Ahmed Shah Jan 2015

Load Management In A Distributed Multimedia Streaming Environment Using A Fault-Tolerant Hierarchical System, Hadi̇ Işik Aybay, Mohammad Ahmed Shah

Turkish Journal of Electrical Engineering and Computer Sciences

In contrast to text-only forms of communications, multimedia uses a combination of audiovisual means alongside textual modes of communication. Streaming multimedia is such multimedia that is constantly delivered by a provider of the multimedia to a client. In streaming multimedia the streamed content is continually presented to and received by the end user. Distributed multimedia systems (DMSs) deliver multimedia content to end-users by means of distributed multimedia databases and distributed information servers. These DMSs are designed to deliver multimedia content over a network. A fault-tolerant redundant hierarchy (Red-HI)-based load management policy for a distributed multimedia streaming system is proposed in …


End-To-End Latency Of A Fault-Tolerant Corba Infrastructure, Wenbing Zhao, Louise E. Moser, P. Michale Melliar-Smith Aug 2014

End-To-End Latency Of A Fault-Tolerant Corba Infrastructure, Wenbing Zhao, Louise E. Moser, P. Michale Melliar-Smith

Wenbing Zhao

This paper presents an evaluation of the end-to-end latency of a fault-tolerant CORBA infrastructure that we have implemented. The fault-tolerant infrastructure replicates the server applications using active, passive and semi-active replication, and maintains strong replica consistency of the server replicas. By analyses and by measurements of the running fault-tolerant infrastructure, we characterize the end-to-end latency under fault-free conditions. The main determining factor of the run-time performance of the fault-tolerant infrastructure is the Totem group communication protocol, which contributes to the end-to-end latency primarily in two ways: the delay in sending messages and the processing cost of the rotating token. To …


Unification Of Transactions And Replication In Three-Tier Architectures Based On Corba, Wenbing Zhao, Louise E. Moser, P. Michael Melliar-Smith Aug 2014

Unification Of Transactions And Replication In Three-Tier Architectures Based On Corba, Wenbing Zhao, Louise E. Moser, P. Michael Melliar-Smith

Wenbing Zhao

In this paper, we describe a software infrastructure that unifies transactions and replication in three-tier architectures and provides data consistency and high availability for enterprise applications. The infrastructure uses transactions based on the CORBA object transaction service to protect the application data in databases on stable storage, using a roll-backward recovery strategy, and replication based on the fault tolerant CORBA standard to protect the middle-tier servers, using a roll-forward recovery strategy. The infrastructure replicates the middle-tier servers to protect the application business logic processing. In addition, it replicates the transaction coordinator, which renders the two-phase commit protocol nonblocking and, thus, …


Barrier Coverage In Wireless Sensor Networks, Zhibo Wang Aug 2014

Barrier Coverage In Wireless Sensor Networks, Zhibo Wang

Doctoral Dissertations

Barrier coverage is a critical issue in wireless sensor networks (WSNs) for security applications, which aims to detect intruders attempting to penetrate protected areas. However, it is difficult to achieve desired barrier coverage after initial random deployment of sensors because their locations cannot be controlled or predicted. In this dissertation, we explore how to leverage the mobility capacity of mobile sensors to improve the quality of barrier coverage.

We first study the 1-barrier coverage formation problem in heterogeneous sensor networks and explore how to efficiently use different types of mobile sensors to form a barrier with pre-deployed different types of …


Fault Tolerant Broadcasting Analysis In Wireless Monitoring Networks, Akbar Ghaffarpour Rahbar Jan 2014

Fault Tolerant Broadcasting Analysis In Wireless Monitoring Networks, Akbar Ghaffarpour Rahbar

Turkish Journal of Electrical Engineering and Computer Sciences

Wireless monitoring networks can be used for security applications such as the monitoring of narrow passages and operational fields. These networks can be designed based on sensor networks. In sensor networks, each node can hear a message and broadcast the message to its neighbor nodes. Nevertheless, nodes may fail, so that faulty nodes cannot hear or cannot transmit any message, where the locations of the faulty nodes are unknown and their failures are permanent. In this paper, the nodes are situated on a line or a square grid-based topology in a plane for security/monitoring applications. For each topology, 2 nonadaptive …


Hard And Soft Error Resilience For One-Sided Dense Linear Algebra Algorithms, Peng Du Aug 2012

Hard And Soft Error Resilience For One-Sided Dense Linear Algebra Algorithms, Peng Du

Doctoral Dissertations

Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). This dissertation develops fault tolerance algorithms for one-sided dense matrix factorizations, which handles Both hard and soft errors.

For hard errors, we propose methods based on diskless checkpointing and Algorithm Based Fault Tolerance (ABFT) to provide full matrix protection, including the left and right factor that are normally seen in …


Fault-Tolerant Operation Of Delta-Connected Scalar- And Vector-Controlled Ac Motor Drives, Ahmed Mohamed Sayed Ahmed, Nabeel Demerdash Jun 2012

Fault-Tolerant Operation Of Delta-Connected Scalar- And Vector-Controlled Ac Motor Drives, Ahmed Mohamed Sayed Ahmed, Nabeel Demerdash

Electrical and Computer Engineering Faculty Research and Publications

Operation and analysis of delta-connected ac motor-drive systems under fault-tolerant open-phase mode of operation is introduced in this paper for both scalar- and vector-controlled motor-drive systems. This technique enables the operation of the three-phase motor upon a failure in one of its phases without the need of a special fault-detection algorithm. It is mainly used to significantly mitigate torque pulsations, which are caused by an open-delta configuration in the stator windings. The performance of the fault-tolerant system was verified using a detailed time stepping finite element simulation as well experimental tests for a 5-hp 460-V induction motor-drive system and the …


Fault-Tolerant Technique For Δ-Connected Ac-Motor Drives, Ahmed Mohamed Sayed Ahmed, Behrooz Mirafzal, Nabeel Demerdash Jun 2011

Fault-Tolerant Technique For Δ-Connected Ac-Motor Drives, Ahmed Mohamed Sayed Ahmed, Behrooz Mirafzal, Nabeel Demerdash

Electrical and Computer Engineering Faculty Research and Publications

A fault-tolerant technique for motor-drive systems is introduced in this paper. The technique is merely presented for ac motors with Δ-connected circuits in their stator windings. In this technique, the faulty phase is isolated by solid-state switches after the occurrence of a failure in one of the stator phases. Then, the fault-tolerant technique manages current-flow in the remaining healthy phases. This technique is to significantly mitigate torque pulsations, which are caused by an open-Δ configuration in the stator windings. The performance of the fault-tolerant technique was experimentally verified using a 5-hp 460-V induction motor-drive system and the results are presented …


A Sustainable Autonomic Architecture For Organically Reconfigurable Computing Systems, Rashad S. Oreifej Jan 2011

A Sustainable Autonomic Architecture For Organically Reconfigurable Computing Systems, Rashad S. Oreifej

Electronic Theses and Dissertations

A Sustainable Autonomic Architecture for Organically Reconfigurable Computing System based on SRAM Field Programmable Gate Arrays (FPGAs) is proposed, modeled analytically, simulated, prototyped, and measured. Low-level organic elements are analyzed and designed to achieve novel self-monitoring, self-diagnosis, and self-repair organic properties. The prototype of a 2-D spatial gradient Sobel video edge-detection organic system use-case developed on a XC4VSX35 Xilinx Virtex-4 Video Starter Kit is presented. Experimental results demonstrate the applicability of the proposed architecture and provide the infrastructure to quantify the performance and overcome fault-handling limitations. Dynamic online autonomous functionality restoration after a malfunction or functionality shift due to changing …


Sustainable Fault-Handling Of Reconfigurable Logic Using Throughput-Driven Assessment, Carthik Sharma Jan 2008

Sustainable Fault-Handling Of Reconfigurable Logic Using Throughput-Driven Assessment, Carthik Sharma

Electronic Theses and Dissertations

A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, …


End-To-End Latency Of A Fault-Tolerant Corba Infrastructure, Wenbing Zhao, Louise E. Moser, P. Michale Melliar-Smith May 2006

End-To-End Latency Of A Fault-Tolerant Corba Infrastructure, Wenbing Zhao, Louise E. Moser, P. Michale Melliar-Smith

Electrical and Computer Engineering Faculty Publications

This paper presents an evaluation of the end-to-end latency of a fault-tolerant CORBA infrastructure that we have implemented. The fault-tolerant infrastructure replicates the server applications using active, passive and semi-active replication, and maintains strong replica consistency of the server replicas. By analyses and by measurements of the running fault-tolerant infrastructure, we characterize the end-to-end latency under fault-free conditions. The main determining factor of the run-time performance of the fault-tolerant infrastructure is the Totem group communication protocol, which contributes to the end-to-end latency primarily in …


Unification Of Transactions And Replication In Three-Tier Architectures Based On Corba, Wenbing Zhao, Louise E. Moser, P. Michael Melliar-Smith Jan 2005

Unification Of Transactions And Replication In Three-Tier Architectures Based On Corba, Wenbing Zhao, Louise E. Moser, P. Michael Melliar-Smith

Electrical and Computer Engineering Faculty Publications

In this paper, we describe a software infrastructure that unifies transactions and replication in three-tier architectures and provides data consistency and high availability for enterprise applications. The infrastructure uses transactions based on the CORBA object transaction service to protect the application data in databases on stable storage, using a roll-backward recovery strategy, and replication based on the fault tolerant CORBA standard to protect the middle-tier servers, using a roll-forward recovery strategy. The infrastructure replicates the middle-tier servers to protect the application business logic processing. In addition, it replicates the transaction coordinator, which renders the two-phase commit protocol nonblocking and, thus, …