Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Engineering

Generalized Techniques For Using System Execution Traces To Support Software Performance Analysis, Thelge Manjula Peiris Dec 2015

Generalized Techniques For Using System Execution Traces To Support Software Performance Analysis, Thelge Manjula Peiris

Open Access Dissertations

This dissertation proposes generalized techniques to support software performance analysis using system execution traces in the absence of software development artifacts such as source code. The proposed techniques do not require modifications to the source code, or to the software binaries, for the purpose of software analysis (non-intrusive). The proposed techniques are also not tightly coupled to the architecture specific details of the system being analyzed. This dissertation extends the current techniques of using system execution traces to evaluate software performance properties, such as response times, service times. The dissertation also proposes a novel technique to auto-construct a dataflow model …


Architectural Techniques To Extend Multi-Core Performance Scaling, Hamza Bin Sohail Apr 2015

Architectural Techniques To Extend Multi-Core Performance Scaling, Hamza Bin Sohail

Open Access Dissertations

Multi-cores have successfully delivered performance improvements over the past decade; however, they now face problems on two fronts: power and off-chip memory bandwidth. Dennard's scaling is effectively coming to an end which has lead to a gradual increase in chip power dissipation. In addition, sustaining off-chip memory bandwidth has become harder due to the limited space for pins on the die and greater current needed to drive the increasing load . My thesis focuses on techniques to address the power and off-chip memory bandwidth challenges in order to avoid the premature end of the multi-core era. ^ In the first …


Captured Open Book Image De-Warping And Shading Correction Using 3d Depth Information, Chyuan-Tyng Wu Apr 2015

Captured Open Book Image De-Warping And Shading Correction Using 3d Depth Information, Chyuan-Tyng Wu

Open Access Dissertations

Various three dimensional (3D) measuring or capturing devices are introduced to the society recently, and there are abundant possibilities that we can take advantage of this new technology. In this research, we worked on one useful application: to correct the distortion due to the curved shape of the pages of an open book in captured images using of depth information. This work is relevant to camera-based capture devices that can use a projector to cast structured light patterns to provide depth information. In order to improve the visual quality of captured documents, we established our algorithm from two perspectives. First, …


Semcache: Semantics-Aware Caching For Efficient Gpu Offloading, Nabeel Al-Saber Apr 2015

Semcache: Semantics-Aware Caching For Efficient Gpu Offloading, Nabeel Al-Saber

Open Access Dissertations

Graphical Processing Units (GPUs) offer massive, highly-efficient parallelism, making them an attractive target for computation-intensive applications. However, GPUs have a separate memory space which introduces the complexity of manually handling explicit data movements between GPU and CPU memory spaces. Although GPU kernels/libraries have made it easy to improve application performance by offloading computation to GPUs, unfortunately it is very difficult to manually optimize CPU-GPU communication between multiple kernel invocations to avoid redundant communication when using these kernels with complex applications. ^ In this thesis, we introduce SemCache, a semantics-aware GPU cache that automatically manages CPU-GPU communication in addition to optimizing …


Black-Box Printer Models And Their Applications, Yanling Ju Apr 2015

Black-Box Printer Models And Their Applications, Yanling Ju

Open Access Dissertations

In the electrophotographic printing process, the deposition of toner within the area of a given printer addressable pixel is strongly influenced by the values of its neighboring pixels. The interaction between neighboring pixels, which is commonly referred to as dot-gain, is complicated. The printer models which are developed according to a pre-designed test page can either be embedded in the halftoning algorithm, or used to predict the printed halftone image at the input to an algorithm being used to assess print quality. In our research, we examine the potential influence of a larger neighborhood (45?45) of the digital halftone image …


Improving Capacity-Performance Tradeoffs In The Storage Tier, Eric P. Villasenor Apr 2015

Improving Capacity-Performance Tradeoffs In The Storage Tier, Eric P. Villasenor

Open Access Dissertations

Data-set sizes are growing. New techniques are emerging to organize and analyze these data-sets. There is a key access pattern emerging with these new techniques, large sequential file accesses. The trend toward bigger files exists to help amortize the cost of data accesses from the storage layer, as many workloads are recognized to be I/O bound. The storage layer is widely recognized as the slowest layer in the system. This work focuses on the tradeoff one can make with that storage capacity to improve system performance. ^ Capacity can be leveraged for improved availability or improved performance. This tradeoff is …


Assessment Of High-Fidelity Collision Models In The Direct Simulation Monte Carlo Method, Andrew Brian Weaver Apr 2015

Assessment Of High-Fidelity Collision Models In The Direct Simulation Monte Carlo Method, Andrew Brian Weaver

Open Access Dissertations

Advances in computer technology over the decades has allowed for more complex physics to be modeled in the DSMC method. Beginning with the first paper on DSMC in 1963, 30,000 collision events per hour were simulated using a simple hard sphere model. Today, more than 10 billion collision events can be simulated per hour for the same problem. Many new and more physically realistic collision models such as the Lennard-Jones potential and the forced harmonic oscillator model have been introduced into DSMC. However, the fact that computer resources are more readily available and higher-fidelity models have been developed does not …


Accelerating Mpi Collective Communications Through Hierarchical Algorithms With Flexible Inter-Node Communication And Imbalance Awareness, Benjamin Scott Parsons Jan 2015

Accelerating Mpi Collective Communications Through Hierarchical Algorithms With Flexible Inter-Node Communication And Imbalance Awareness, Benjamin Scott Parsons

Open Access Dissertations

This work presents and evaluates algorithms for MPI collective communication operations on high performance systems. Collective communication algorithms are extensively investigated, and a universal algorithm to improve the performance of MPI collective operations on hierarchical clusters is introduced. This algorithm exploits shared-memory buffers for efficient intra-node communication while still allowing the use of unmodified, hierarchy-unaware traditional collectives for inter-node communication. The universal algorithm shows impressive performance results with a variety of collectives, improving upon the MPICH algorithms as well as the Cray MPT algorithms. Speedups average 15x - 30x for most collectives with improved scalability up to 65536 cores.^ Further …