Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Engineering

Approaches To Improve The Execution Time Of A Quantum Network Simulation, Joseph B. Tippit Dec 2021

Approaches To Improve The Execution Time Of A Quantum Network Simulation, Joseph B. Tippit

Theses and Dissertations

Evaluating quantum networks is an expensive and time-consuming task that benefits from simulation. A potential improvement is to utilize GPUs, namely by leveraging NVIDIA's programming framework, CUDA. To avoid performance pitfalls of higher level languages and programming models such as the so called "two language problem," the Julia Programming Language provides the basis for the development effort. This research develops a two module prototype quantum network simulation framework using GPUs and Julia. Performance of the software is measured and compared against other languages such as MATLAB.


Algorithms And Framework For Computing 2-Body Statistics On Graphics Processing Units, Napath Pitaksirianan Feb 2020

Algorithms And Framework For Computing 2-Body Statistics On Graphics Processing Units, Napath Pitaksirianan

USF Tampa Graduate Theses and Dissertations

Various types of two-body statistics (2-BS) are regarded as essential components of low-level data analysis in scientific database systems. In relational algebraic terms, a 2-BS is essentially a Cartesian product between two datasets (or two instances of the same dataset) followed by a user-defined aggregate. The quadratic complexity of these computations hinders the timely processing of data. Thus using modern parallel hardware has become an obvious solution to meet such challenges. This dissertation presents our recent work in designing and optimizing parallel algorithms for 2-BS computation on Graphics Processing Units (GPUs). The unique architecture, however, provides abundant opportunities for optimizing …


Enforcing Security Policies On Gpu Computing Through The Use Of Aspect-Oriented Programming Techniques, Bader Albassam Jun 2016

Enforcing Security Policies On Gpu Computing Through The Use Of Aspect-Oriented Programming Techniques, Bader Albassam

USF Tampa Graduate Theses and Dissertations

This thesis presents a new security policy enforcer designed for securing parallel computation on CUDA GPUs. We show how the very features that make a GPGPU desirable have already been utilized in existing exploits, fortifying the need for security protections on a GPGPU. An aspect weaver was designed for CUDA with the goal of utilizing aspect-oriented programming for security policy enforcement. Empirical testing verified the ability of our aspect weaver to enforce various policies. Furthermore, a performance analysis was performed to demonstrate that using this policy enforcer provides no significant performance impact over manual insertion of policy code. Finally, future …


Astro – A Low-Cost, Low-Power Cluster For Cpu-Gpu Hybrid Computing Using The Jetson Tk1, Sean Kai Sheen Jun 2016

Astro – A Low-Cost, Low-Power Cluster For Cpu-Gpu Hybrid Computing Using The Jetson Tk1, Sean Kai Sheen

Master's Theses

With the rising costs of large scale distributed systems many researchers have began looking at utilizing low power architectures for clusters. In this paper, we describe our Astro cluster, which consists of 46 NVIDIA Jetson TK1 nodes each equipped with an ARM Cortex A15 CPU, 192 core Kepler GPU, 2 GB of RAM, and 16 GB of flash storage. The cluster has a number of advantages when compared to conventional clusters including lower power usage, ambient cooling, shared memory between the CPU and GPU, and affordability. The cluster is built using commodity hardware and can be setup for relatively low …


Accelerating Scientific Computing Models Using Gpu Processing, Raymond F. Flagg Iii Aug 2015

Accelerating Scientific Computing Models Using Gpu Processing, Raymond F. Flagg Iii

Electronic Theses and Dissertations

GPGPUs offer significant computational power for programmers to leverage. This computational power is especially useful when utilized for accelerating scientific models. This thesis analyzes the utilization of GPGPU programming to accelerate scientific computing models.

First the construction of hardware for visualization and computation of scientific models is discussed. Several factors in the construction of the machines focus on the performance impacts related to scientific modeling.

Image processing is an embarrassingly parallel problem well suited for GPGPU acceleration. An image processing library was developed to show the processes of recognizing embarrassingly parallel problems and serves as an excellent example of converting …


Optimizing Lempel-Ziv Factorization For The Gpu Architecture, Bryan Ching Jun 2014

Optimizing Lempel-Ziv Factorization For The Gpu Architecture, Bryan Ching

Master's Theses

Lossless data compression is used to reduce storage requirements, allowing for the relief of I/O channels and better utilization of bandwidth. The Lempel-Ziv lossless compression algorithms form the basis for many of the most commonly used compression schemes. General purpose computing on graphic processing units (GPGPUs) allows us to take advantage of the massively parallel nature of GPUs for computations other that their original purpose of rendering graphics. Our work targets the use of GPUs for general lossless data compression. Specifically, we developed and ported an algorithm that constructs the Lempel-Ziv factorization directly on the GPU. Our implementation bypasses the …


Interfaz Natural Para La Programación De Un Robot Manipulador A Través De Un Kinect, Daniel Leonardo Mariño Lizarazo Jan 2014

Interfaz Natural Para La Programación De Un Robot Manipulador A Través De Un Kinect, Daniel Leonardo Mariño Lizarazo

Ingeniería en Automatización

En un futuro los robots van a ser una parte natural de nuestra vida, para esto es importante que puedan interactuar de manera natural con las personas y que además puedan aprender cómo realizar nuevas tareas. Por medio de la programación por demostración, el usuario puede “explicar” la ejecución de una tarea a un robot mostrándole como se realiza sin necesidad de un lenguaje complicado. En este proyecto se desarrolló un sistema con en el cual el usuario puede generar programas para un robot manipulador sin la necesidad de tener conocimientos de programación o de robótica. El sistema propuesto permite …


Neuromodulation Based Control Of Autonomous Robots On A Cloud Computing Platform, Cameron Muhammad Jan 2014

Neuromodulation Based Control Of Autonomous Robots On A Cloud Computing Platform, Cameron Muhammad

Electronic Theses and Dissertations

In recent years, the advancement of neurobiologically plausible models and computer networking has resulted in new ways of implementing control systems on robotic platforms. The work presents a control approach based on vertebrate neuromodulation and its implementation on autonomous robots in the open-source, open-access environment of robot operating system (ROS). A spiking neural network (SNN) is used to model the neuromodulatory function for generating context based behavioral responses of the robots to sensory input signals. The neural network incorporates three types of neurons- cholinergic and noradrenergic (ACh/NE) neurons for attention focusing and action selection, dopaminergic (DA) neurons for rewards- and …


Cuda Enhanced Filtering In A Pipelined Video Processing Framework, Austin Aaron Dworaczyk Wiltshire Jun 2013

Cuda Enhanced Filtering In A Pipelined Video Processing Framework, Austin Aaron Dworaczyk Wiltshire

Master's Theses

The processing of digital video has long been a significant computational task for modern x86 processors. With every video frame composed of one to three planes, each consisting of a two-dimensional array of pixel data, and a video clip comprising of thousands of such frames, the sheer volume of data is significant. With the introduction of new high definition video formats such as 4K or stereoscopic 3D, the volume of uncompressed frame data is growing ever larger.

Modern CPUs offer performance enhancements for processing digital video through SIMD instructions such as SSE2 or AVX. However, even with these instruction sets, …


Paris: A Parallel Rsa-Prime Inspection Tool, Joseph R. White Jun 2013

Paris: A Parallel Rsa-Prime Inspection Tool, Joseph R. White

Master's Theses

Modern-day computer security relies heavily on cryptography as a means to protect the data that we have become increasingly reliant on. As the Internet becomes more ubiquitous, methods of security must be better than ever. Validation tools can be leveraged to help increase our confidence and accountability for methods we employ to secure our systems.

Security validation, however, can be difficult and time-consuming. As our computational ability increases, calculations that were once considered “hard” due to length of computation, can now be done in minutes. We are constantly increasing the size of our keys and attempting to make computations harder …


Multiple Bounding Boxes Algorithm In Collision Detection And Its Performances In Sequential Vs Cuda Parallel Processing, Min Qi Jan 2013

Multiple Bounding Boxes Algorithm In Collision Detection And Its Performances In Sequential Vs Cuda Parallel Processing, Min Qi

Electronic Theses and Dissertations

The traditional method for detecting collisions in a 2D computer game uses a axisaligned bounding box around each sprite, and checks to determine if the bounding boxes overlap periodically. Using this single bounding box method may result in a large amount of pixel intersection tests, since a sprite may be composed of areas where the pixels are empty and the intersecting bounding box test results in false positives.

Our algorithm analysis shows that the optimal two or three bounding boxes is the best partition we can get for a reasonable time complexity. The results further show significantly diminishing returns for …


Efficient, Scalable, Parallel, Matrix-Matrix Multiplication, Enrique Portillo Jan 2013

Efficient, Scalable, Parallel, Matrix-Matrix Multiplication, Enrique Portillo

Open Access Theses & Dissertations

For the past decade, power/energy consumption has become a limiting factor for large-scale and embedded High Performance Computing (HPC) systems. This is especially true for systems that include accelerators, e.g., high-end computing devices, such as Graphics Processing Units (GPUs), with terascale computing capabilities and high power draws that greatly surpass that of multi-core CPUs. Accordingly, improving the node-level power/energy efficiency of an application can have a direct and positive impact on both classes of HPC systems.

The research reported in this thesis explores the use of software techniques to enhance the execution-time and power-consumption performance of applications executed on a …


Exploring Computational Chemistry On Emerging Architectures, David Dewayne Jenkins Dec 2012

Exploring Computational Chemistry On Emerging Architectures, David Dewayne Jenkins

Doctoral Dissertations

Emerging architectures, such as next generation microprocessors, graphics processing units, and Intel MIC cards, are being used with increased popularity in high performance computing. Each of these architectures has advantages over previous generations of architectures including performance, programmability, and power efficiency. With the ever-increasing performance of these architectures, scientific computing applications are able to attack larger, more complicated problems. However, since applications perform differently on each of the architectures, it is difficult to determine the best tool for the job. This dissertation makes the following contributions to computer engineering and computational science. First, this work implements the computational chemistry variational …


Cuda Web Api Remote Execution Of Cuda Kernels Using Web Services, Massimo J. Becker Jun 2012

Cuda Web Api Remote Execution Of Cuda Kernels Using Web Services, Massimo J. Becker

Master's Theses

Massively parallel programming is an increasingly growing field with the recent introduction of general purpose GPU computing. Modern graphics processors from NVIDIA and AMD have massively parallel architectures that can be used for such applications as 3D rendering, financial analysis, physics simulations, and biomedical analysis. These massively parallel systems are exposed to programmers through in- terfaces such as NVIDIAs CUDA, OpenCL, and Microsofts C++ AMP. These frame- works expose functionality using primarily either C or C++. In order to use these massively parallel frameworks, programs being implemented must be run on machines equipped with massively parallel hardware. These requirements limit …


Gpu Implementation Of A Novel Approach To Cramer’S Algorithm For Solving Large Scale Linear Systems, Rosanne Lane West May 2010

Gpu Implementation Of A Novel Approach To Cramer’S Algorithm For Solving Large Scale Linear Systems, Rosanne Lane West

Masters Theses

Scientific computing often requires solving systems of linear equations. Most software pack- ages for solving large-scale linear systems use Gaussian elimination methods such as LU- decomposition. An alternative method, recently introduced by K. Habgood and I. Arel, involves an application of Cramer’s Rule and Chio’s condensation to achieve a better per- forming system for solving linear systems on parallel computing platforms. This thesis describes an implementation of this algorithm on an nVidia graphics processor card us- ing the CUDA language. Increased performance, relative to the serial implementation, is demonstrated, paving the way for future parallel realizations of the scheme.