Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Engineering
Automatic Detection Of Abnormal Behavior In Computing Systems, James Frank Roberts
Automatic Detection Of Abnormal Behavior In Computing Systems, James Frank Roberts
Theses and Dissertations--Computer Science
I present RAACD, a software suite that detects misbehaving computers in large computing systems and presents information about those machines to the system administrator. I build this system using preexisting anomaly detection techniques. I evaluate my methods using simple synthesized data, real data containing coerced abnormal behavior, and real data containing naturally occurring abnormal behavior. I find that the system adequately detects abnormal behavior and significantly reduces the amount of uninteresting computer health data presented to a system administrator.
Algorithms For Fault Tolerance In Distributed Systems And Routing In Ad Hoc Networks, Qiangfeng Jiang
Algorithms For Fault Tolerance In Distributed Systems And Routing In Ad Hoc Networks, Qiangfeng Jiang
Theses and Dissertations--Computer Science
Checkpointing and rollback recovery are well-known techniques for coping with failures in distributed systems. Future generation Supercomputers will be message passing distributed systems consisting of millions of processors. As the number of processors grow, failure rate also grows. Thus, designing efficient checkpointing and recovery algorithms for coping with failures in such large systems is important for these systems to be fully utilized. We presented a novel communication-induced checkpointing algorithm which helps in reducing contention for accessing stable storage to store checkpoints. Under our algorithm, a process involved in a distributed computation can independently initiate consistent global checkpointing by saving its …