Systems Architecture | Open Access Articles | Digital Commons Network™

Dynamically Finding Optimal Kernel Launch Parameters For Cuda Programs, Taabish Jeshani

Electronic Thesis and Dissertation Repository

In this thesis, we present KLARAPTOR (Kernel LAunch parameters RAtional Program estimaTOR), a freely available tool to dynamically determine the values of kernel launch parameters of a CUDA kernel. We describe a technique for building a helper program, at the compile-time of a CUDA program, that is used at run-time to determine near-optimal kernel launch parameters for the kernels of that CUDA program. This technique leverages the MWP-CWP performance prediction model, runtime data parameters, and runtime hardware parameters to dynamically determine the launch parameters for each kernel invocation. This technique is implemented within the KLARAPTOR tool, utilizing the LLVM Pass …

Go to article

Gpgpu Microbenchmarking For Irregular Application Optimization, Dalton R. Winans-Pruitt

Theses and Dissertations

Irregular applications, such as unstructured mesh operations, do not easily map onto the typical GPU programming paradigms endorsed by GPU manufacturers, which mostly focus on maximizing concurrency for latency hiding. In this work, we show how alternative techniques focused on latency amortization can be used to control overall latency while requiring less concurrency. We used a custom-built microbenchmarking framework to test several GPU kernels and show how the GPU behaves under relevant workloads. We demonstrate that coalescing is not required for efficacious performance; an uncoalesced access pattern can achieve high bandwidth - even over 80% of the theoretical global memory …

Go to article

Millipyde: A Cross-Platform Python Framework For Transparent Gpu Acceleration, James B. Asbury

Master's Theses

The prevalence of general-purpose GPU computing continues to grow and tackle a wider variety of problems that benefit from GPU-acceleration. This acceleration often suffers from a high barrier to entry, however, due to the complexity of software tools that closely map to the underlying GPU hardware, the fast-changing landscape of GPU environments, and the fragmentation of tools and languages that only support specific platforms. Because of this, new solutions will continue to be needed to make GPGPU acceleration more accessible to the developers that can benefit from it. AMD’s new cross-platform development ecosystem ROCm provides promise for developing applications and …

Go to article

Systems Architecture Commons^™

Full-Text Articles in Systems Architecture

Dynamically Finding Optimal Kernel Launch Parameters For Cuda Programs, Taabish Jeshani

Electronic Thesis and Dissertation Repository

Gpgpu Microbenchmarking For Irregular Application Optimization, Dalton R. Winans-Pruitt

Theses and Dissertations

Millipyde: A Cross-Platform Python Framework For Transparent Gpu Acceleration, James B. Asbury

Master's Theses