Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Computer Engineering

Generalized Techniques For Using System Execution Traces To Support Software Performance Analysis, Thelge Manjula Peiris Dec 2015

Generalized Techniques For Using System Execution Traces To Support Software Performance Analysis, Thelge Manjula Peiris

Open Access Dissertations

This dissertation proposes generalized techniques to support software performance analysis using system execution traces in the absence of software development artifacts such as source code. The proposed techniques do not require modifications to the source code, or to the software binaries, for the purpose of software analysis (non-intrusive). The proposed techniques are also not tightly coupled to the architecture specific details of the system being analyzed. This dissertation extends the current techniques of using system execution traces to evaluate software performance properties, such as response times, service times. The dissertation also proposes a novel technique to auto-construct a dataflow model …


Architectural Techniques To Extend Multi-Core Performance Scaling, Hamza Bin Sohail Apr 2015

Architectural Techniques To Extend Multi-Core Performance Scaling, Hamza Bin Sohail

Open Access Dissertations

Multi-cores have successfully delivered performance improvements over the past decade; however, they now face problems on two fronts: power and off-chip memory bandwidth. Dennard's scaling is effectively coming to an end which has lead to a gradual increase in chip power dissipation. In addition, sustaining off-chip memory bandwidth has become harder due to the limited space for pins on the die and greater current needed to drive the increasing load . My thesis focuses on techniques to address the power and off-chip memory bandwidth challenges in order to avoid the premature end of the multi-core era. ^ In the first …


Captured Open Book Image De-Warping And Shading Correction Using 3d Depth Information, Chyuan-Tyng Wu Apr 2015

Captured Open Book Image De-Warping And Shading Correction Using 3d Depth Information, Chyuan-Tyng Wu

Open Access Dissertations

Various three dimensional (3D) measuring or capturing devices are introduced to the society recently, and there are abundant possibilities that we can take advantage of this new technology. In this research, we worked on one useful application: to correct the distortion due to the curved shape of the pages of an open book in captured images using of depth information. This work is relevant to camera-based capture devices that can use a projector to cast structured light patterns to provide depth information. In order to improve the visual quality of captured documents, we established our algorithm from two perspectives. First, …


An Empirical Approach To The Re-Creation Of Vehicle Drive Cycles, Andrew J. Larson Apr 2015

An Empirical Approach To The Re-Creation Of Vehicle Drive Cycles, Andrew J. Larson

Open Access Theses

Vehicles such as buses, delivery trucks, mining equipment, and motorsport vehicles often repeat a highly defined pattern, route, or track during normal use. For these vehicles, standard dynamometer drive cycles are of little use. It was proposed that deriving a vehicle drive cycle from empirical data collected from on-board vehicle sensors would produce more accurate vehicle characteristic predictions for special purpose vehicles. This study answers the question "Is it possible to use recorded vehicle data to replicate a real world driving scenario for the purpose of vehicle diagnostics?" To reduce the complexity of the project, an electric go-kart was used …


Hybrid Power System For Micro Air Vehicles, Bakytgul Khaday Apr 2015

Hybrid Power System For Micro Air Vehicles, Bakytgul Khaday

Open Access Theses

Today Micro Air Vehicles are in need of a good power source that would enable them longer flight time and various functionalities. This work is focused on to this problem. A possible solution that is offered in this study is implementing a hybrid power system consisting of battery and supercapacitor (SCAP). The proposed hybrid power system was tested on an existing MAV platform (Cheerson CX-10). A separate hybrid power printed circuit board (PCB) was designed and manufactured. For experimental and system verification purposes, the PCB was not sized for on-board flight. The hybrid power PCB was connected to MAV through …


Hubcheck: Check The Hub, Derrick S. Kearney Apr 2015

Hubcheck: Check The Hub, Derrick S. Kearney

Open Access Theses

The HUBzero Platform is a framework for building websites, referred to as "hubs," that promote research communities through online simulation, data management, and collaboration. With each software release, the HUBzero Team dedicates weeks of team members' time toward manually testing, fixing, and retesting hub components. The unique mixture of environments that make up a hub makes using existing automated testing solutions hard and shifts the burden of testing to humans, promoting variation, spot checking of fixes, and other shortcuts to avoid the high cost of completely retesting the system. With over twenty hubs being actively managed by the HUBzero Team, …


Diagnosis Of Systematic Defects Based On Design-For-Manufacturability Guidelines, Dhawal Krishana Gupta Apr 2015

Diagnosis Of Systematic Defects Based On Design-For-Manufacturability Guidelines, Dhawal Krishana Gupta

Open Access Theses

All products in the Very-Large-Scale-Integrated-Circuit (VLSIC) industry go through three major stages of production - Design, Verification and Manufacturing. Unfortunately, neither of these stages are truly perfect, hence we need two more sub-stages of manufacturing, namely Testing and Defect Diagnosis to prevent imperfections in ICs. Testing is used to generate test vectors to validate the functionality of the Device-under-Test (DUT), and Defect Diagnosis is the process of identifying the root-cause of a failing chip, i.e., the location and nature of defect. Systematic defects are unintended structural and material changes at specific locations with a higher probability of failure due to …


Trajectory Generation For Lane-Change Maneuver Of Autonomous Vehicles, Ashesh Goswami Apr 2015

Trajectory Generation For Lane-Change Maneuver Of Autonomous Vehicles, Ashesh Goswami

Open Access Theses

Lane-change maneuver is one of the most thoroughly investigated automatic driving operations that can be used by an autonomous self-driving vehicle as a primitive for performing more complex operations like merging, entering/exiting highways or overtaking another vehicle. This thesis focuses on two coherent problems that are associated with the trajectory generation for lane-change maneuvers of autonomous vehicles in a highway scenario: (i) an effective velocity estimation of neighboring vehicles under different road scenarios involving linear and curvilinear motion of the vehicles, and (ii) trajectory generation based on the estimated velocities of neighboring vehicles for safe operation of self-driving cars during …


Characterization Of Vectorization Strategies For Recursive Algorithms, Shruthi Balakrishna Apr 2015

Characterization Of Vectorization Strategies For Recursive Algorithms, Shruthi Balakrishna

Open Access Theses

A successful architectural trend in parallelism is the emphasis on data parallelism with SIMD hardware. Since SIMD extensions on commodity processors tend to require relatively little extra hardware, executing a SIMD instruction is essentially free from a power perspective, making vector computation an attractive target for parallelism. SIMD instructions are designed to accelerate the performance of applications such as motion video, real-time physics and graphics. Such applications perform repetitive operations on large arrays of numbers. While the key idea is to parallelize significant portions of data that get operated by several sequential instructions into a single instruction, not every application …


Exploiting Intra-Warp Address Monotonicity For Fast Memory Coalescing In Gpus, Hector Rodriguez-Simmonds Apr 2015

Exploiting Intra-Warp Address Monotonicity For Fast Memory Coalescing In Gpus, Hector Rodriguez-Simmonds

Open Access Theses

Graphics Processing Units (GPUs) are growing increasingly popular as general purpose compute accelerators. GPUs are best suited for applications which have abundant data parallelism wherein the computation expressed as a single thread can be applied over a large set of data items. One key constraint that affects application performance on GPUs is that the underlying hardware is single-instruction, multiple data (SIMD) hardware which requires parallel instructions from the multiple threads to execute in a lock-step manner. The benefits of lock-step execution can be seriously degraded if the threads diverge (because of memory or branches). Specifically in the case of memory, …


Recursive Tree Traversal Dependence Analysis, Yusheng Weijiang Apr 2015

Recursive Tree Traversal Dependence Analysis, Yusheng Weijiang

Open Access Theses

While there has been much work done on analyzing and transforming regular programs that operate over linear arrays and dense matrices, comparatively little has been done to try to carry these optimizations over to programs that operate over heap-based data structures using pointers. Previous work has shown that point blocking, a technique similar to loop tiling in regular programs, can help increase the temporal locality of repeated tree traversals. Point blocking, however, has only been shown to work on tree traversals where each traversal is fully independent and would allow parallelization, greatly limiting the types of applications that this transformation …


Semcache: Semantics-Aware Caching For Efficient Gpu Offloading, Nabeel Al-Saber Apr 2015

Semcache: Semantics-Aware Caching For Efficient Gpu Offloading, Nabeel Al-Saber

Open Access Dissertations

Graphical Processing Units (GPUs) offer massive, highly-efficient parallelism, making them an attractive target for computation-intensive applications. However, GPUs have a separate memory space which introduces the complexity of manually handling explicit data movements between GPU and CPU memory spaces. Although GPU kernels/libraries have made it easy to improve application performance by offloading computation to GPUs, unfortunately it is very difficult to manually optimize CPU-GPU communication between multiple kernel invocations to avoid redundant communication when using these kernels with complex applications. ^ In this thesis, we introduce SemCache, a semantics-aware GPU cache that automatically manages CPU-GPU communication in addition to optimizing …


Black-Box Printer Models And Their Applications, Yanling Ju Apr 2015

Black-Box Printer Models And Their Applications, Yanling Ju

Open Access Dissertations

In the electrophotographic printing process, the deposition of toner within the area of a given printer addressable pixel is strongly influenced by the values of its neighboring pixels. The interaction between neighboring pixels, which is commonly referred to as dot-gain, is complicated. The printer models which are developed according to a pre-designed test page can either be embedded in the halftoning algorithm, or used to predict the printed halftone image at the input to an algorithm being used to assess print quality. In our research, we examine the potential influence of a larger neighborhood (45?45) of the digital halftone image …


Improving Capacity-Performance Tradeoffs In The Storage Tier, Eric P. Villasenor Apr 2015

Improving Capacity-Performance Tradeoffs In The Storage Tier, Eric P. Villasenor

Open Access Dissertations

Data-set sizes are growing. New techniques are emerging to organize and analyze these data-sets. There is a key access pattern emerging with these new techniques, large sequential file accesses. The trend toward bigger files exists to help amortize the cost of data accesses from the storage layer, as many workloads are recognized to be I/O bound. The storage layer is widely recognized as the slowest layer in the system. This work focuses on the tradeoff one can make with that storage capacity to improve system performance. ^ Capacity can be leveraged for improved availability or improved performance. This tradeoff is …


Assessment Of High-Fidelity Collision Models In The Direct Simulation Monte Carlo Method, Andrew Brian Weaver Apr 2015

Assessment Of High-Fidelity Collision Models In The Direct Simulation Monte Carlo Method, Andrew Brian Weaver

Open Access Dissertations

Advances in computer technology over the decades has allowed for more complex physics to be modeled in the DSMC method. Beginning with the first paper on DSMC in 1963, 30,000 collision events per hour were simulated using a simple hard sphere model. Today, more than 10 billion collision events can be simulated per hour for the same problem. Many new and more physically realistic collision models such as the Lennard-Jones potential and the forced harmonic oscillator model have been introduced into DSMC. However, the fact that computer resources are more readily available and higher-fidelity models have been developed does not …


Linear Matrix Inequality-Based Nonlinear Adaptive Robust Control With Application To Unmanned Aircraft Systems, David William Kun Apr 2015

Linear Matrix Inequality-Based Nonlinear Adaptive Robust Control With Application To Unmanned Aircraft Systems, David William Kun

Open Access Theses

Unmanned aircraft systems (UASs) are gaining popularity in civil and commercial applications as their lightweight on-board computers become more powerful and affordable, their power storage devices improve, and the Federal Aviation Administration addresses the legal and safety concerns of integrating UASs in the national airspace. Consequently, many researchers are pursuing novel methods to control UASs in order to improve their capabilities, dependability, and safety assurance. The nonlinear control approach is a common choice as it offers several benefits for these highly nonlinear aerospace systems (e.g., the quadrotor). First, the controller design is physically intuitive and is derived from well known …


Accelerating Mpi Collective Communications Through Hierarchical Algorithms With Flexible Inter-Node Communication And Imbalance Awareness, Benjamin Scott Parsons Jan 2015

Accelerating Mpi Collective Communications Through Hierarchical Algorithms With Flexible Inter-Node Communication And Imbalance Awareness, Benjamin Scott Parsons

Open Access Dissertations

This work presents and evaluates algorithms for MPI collective communication operations on high performance systems. Collective communication algorithms are extensively investigated, and a universal algorithm to improve the performance of MPI collective operations on hierarchical clusters is introduced. This algorithm exploits shared-memory buffers for efficient intra-node communication while still allowing the use of unmodified, hierarchy-unaware traditional collectives for inter-node communication. The universal algorithm shows impressive performance results with a variety of collectives, improving upon the MPICH algorithms as well as the Cray MPT algorithms. Speedups average 15x - 30x for most collectives with improved scalability up to 65536 cores.^ Further …