Entire DC Network | Open Access Articles | Digital Commons Network™

Improving Scalability And Usability Of Parallel Runtime Environments For High Availability And High Performance Systems, Thara Angskun Dec 2007

Improving Scalability And Usability Of Parallel Runtime Environments For High Availability And High Performance Systems, Thara Angskun

Doctoral Dissertations

The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. Hence, parallel runtime environments have to support and adapt to the underlying platforms that require scalability and fault management in more and more dynamic environments. This dissertation aims to analyze, understand and improve the state of the art mechanisms for managing highly dynamic, large scale applications.

This dissertation demonstrates that the use of new scalable and fault-tolerant topologies, combined with rerouting techniques, builds parallel runtime environments, which are able to efficiently and reliably deliver sets of information to a large …

Go to article

Statistical And Machine Learning Techniques Applied To Algorithm Selection For Solving Sparse Linear Systems, Erika Fuentes Dec 2007

Statistical And Machine Learning Techniques Applied To Algorithm Selection For Solving Sparse Linear Systems, Erika Fuentes

Doctoral Dissertations

There are many applications and problems in science and engineering that require large-scale numerical simulations and computations. The issue of choosing an appropriate method to solve these problems is very common, however it is not a trivial one, principally because this decision is most of the times too hard for humans to make, or certain degree of expertise and knowledge in the particular discipline, or in mathematics, are required. Thus, the development of a methodology that can facilitate or automate this process and helps to understand the problem, would be of great interest and help. The proposal is to utilize …

Go to article

Towards Automatic And Adaptive Optimizations Of Mpi Collective Operations, Jelena Pjesivac-Grbovic Dec 2007

Towards Automatic And Adaptive Optimizations Of Mpi Collective Operations, Jelena Pjesivac-Grbovic

Doctoral Dissertations

Message passing is one of the most commonly used paradigms of parallel programming. Message Passing Interface, MPI, is a standard used in scientific and high-performance computing. Collective operations are a subset of MPI standard that deals with processes synchronization, data exchange and computation among a group of processes. The collective operations are commonly used and can be application performance bottleneck. The performance of collective operations depends on many factors, some of which are the input parameters (e.g., communicator and message size); system characteristics (e.g., interconnect type); the application computation and communication pattern; and internal algorithm parameters (e.g., internal segment size). …

Go to article

A Finite Difference Method For Studying Thermal Deformation In A Three-Dimensional Microsphere Exposed To Ultrashort-Pulsed Lasers, Xudong Du Jul 2007

A Finite Difference Method For Studying Thermal Deformation In A Three-Dimensional Microsphere Exposed To Ultrashort-Pulsed Lasers, Xudong Du

Doctoral Dissertations

Ultrashort-pulsed lasers with pulse durations on the order of sub-picoseconds to femtoseconds possess the capabilities in limiting the undesirable spread of the thermal process zone in a heated sample which have been attracting worldwide interest in science and engineering. Success of ultrashort-pulsed lasers in real application relies on: (1) well characterized pulse width, intensity and experimental techniques; (2) reliable microscale heat transfer models; and (3) prevention of thermal damage. Laser damage by ultrashort-pulsed lasers occurs after the heating pulse is over since the pulse duration time is extremely short and the heat flux is essentially limited to the region within …

Go to article

Automated Gene Classification Using Nonnegative Matrix Factorization On Biomedical Literature, Kevin Erich Heinrich May 2007

Automated Gene Classification Using Nonnegative Matrix Factorization On Biomedical Literature, Kevin Erich Heinrich

Doctoral Dissertations

Understanding functional gene relationships is a challenging problem for biological applications. High-throughput technologies such as DNA microarrays have inundated biologists with a wealth of information, however, processing that information remains problematic. To help with this problem, researchers have begun applying text mining techniques to the biological literature. This work extends previous work based on Latent Semantic Indexing (LSI) by examining Nonnegative Matrix Factorization (NMF). Whereas LSI incorporates the singular value decomposition (SVD) to approximate data in a dense, mixed-sign space, NMF produces a parts-based factorization that is directly interpretable. This space can, in theory, be used to augment existing ontologies …

Go to article

Leaf: A Learning-Based Fault Diagnostic System For Multi-Robot Teams, Balajee Kannan May 2007

Leaf: A Learning-Based Fault Diagnostic System For Multi-Robot Teams, Balajee Kannan

Doctoral Dissertations

The failure-prone complex operating environment of a standard multi-robot application dictates some amount of fault-tolerance to be incorporated into every system. In fact, the quality of the incorporated fault-tolerance has a direct impact on the overall performance of the system. Despite the extensive work being done in the field of multi-robot systems, there does not exist a general methodology for fault diagnosis and recovery. The objective of this research, in part, is to provide an adaptive approach that enables the robot team to autonomously detect and compensate for the wide variety of faults that could be experienced. The key feature …

Go to article

Reliability -Aware Optimal Checkpoint /Restart Model In High Performance Computing, Yudan Liu Apr 2007

Reliability -Aware Optimal Checkpoint /Restart Model In High Performance Computing, Yudan Liu

Doctoral Dissertations

Computational power demand for large challenging problems has increasingly driven the physical size of High Performance Computing (HPC) systems. As the system gets larger, it requires more and more components (processor, memory, disk, switch, power supply and so on). Thus, challenges arise in handling reliability of such large-scale systems. In order to minimize the performance loss due to unexpected failures, fault tolerant mechanisms are vital to sustain computational power in such environment. Checkpoint/restart is a common fault tolerant technique which has been widely applied in the single computer system. However, checkpointing in a large-scale HPC environment is much more challenging …

Go to article

Digital Commons Network^™

Full-Text Articles in Entire DC Network

Improving Scalability And Usability Of Parallel Runtime Environments For High Availability And High Performance Systems, Thara Angskun

Doctoral Dissertations

Statistical And Machine Learning Techniques Applied To Algorithm Selection For Solving Sparse Linear Systems, Erika Fuentes

Doctoral Dissertations

Towards Automatic And Adaptive Optimizations Of Mpi Collective Operations, Jelena Pjesivac-Grbovic

Doctoral Dissertations

A Finite Difference Method For Studying Thermal Deformation In A Three-Dimensional Microsphere Exposed To Ultrashort-Pulsed Lasers, Xudong Du

Doctoral Dissertations

Automated Gene Classification Using Nonnegative Matrix Factorization On Biomedical Literature, Kevin Erich Heinrich

Doctoral Dissertations

Leaf: A Learning-Based Fault Diagnostic System For Multi-Robot Teams, Balajee Kannan

Doctoral Dissertations

Reliability -Aware Optimal Checkpoint /Restart Model In High Performance Computing, Yudan Liu

Doctoral Dissertations