Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Tennessee, Knoxville

Theses/Dissertations

2010

GPU

Articles 1 - 2 of 2

Full-Text Articles in Engineering

Performance Evaluation Of Memory And Computationally Bound Chemistry Applications On Streaming Gpgpus And Multi-Core X86 Cpus, Frederick E. Weber Iii May 2010

Performance Evaluation Of Memory And Computationally Bound Chemistry Applications On Streaming Gpgpus And Multi-Core X86 Cpus, Frederick E. Weber Iii

Masters Theses

In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming …


Gpu Implementation Of A Novel Approach To Cramer’S Algorithm For Solving Large Scale Linear Systems, Rosanne Lane West May 2010

Gpu Implementation Of A Novel Approach To Cramer’S Algorithm For Solving Large Scale Linear Systems, Rosanne Lane West

Masters Theses

Scientific computing often requires solving systems of linear equations. Most software pack- ages for solving large-scale linear systems use Gaussian elimination methods such as LU- decomposition. An alternative method, recently introduced by K. Habgood and I. Arel, involves an application of Cramer’s Rule and Chio’s condensation to achieve a better per- forming system for solving linear systems on parallel computing platforms. This thesis describes an implementation of this algorithm on an nVidia graphics processor card us- ing the CUDA language. Increased performance, relative to the serial implementation, is demonstrated, paving the way for future parallel realizations of the scheme.