Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Programming Languages and Compilers

Dynamically Finding Optimal Kernel Launch Parameters For Cuda Programs, Taabish Jeshani Apr 2023

Dynamically Finding Optimal Kernel Launch Parameters For Cuda Programs, Taabish Jeshani

Electronic Thesis and Dissertation Repository

In this thesis, we present KLARAPTOR (Kernel LAunch parameters RAtional Program estimaTOR), a freely available tool to dynamically determine the values of kernel launch parameters of a CUDA kernel. We describe a technique for building a helper program, at the compile-time of a CUDA program, that is used at run-time to determine near-optimal kernel launch parameters for the kernels of that CUDA program. This technique leverages the MWP-CWP performance prediction model, runtime data parameters, and runtime hardware parameters to dynamically determine the launch parameters for each kernel invocation. This technique is implemented within the KLARAPTOR tool, utilizing the LLVM Pass …


Advances In The Automatic Detection Of Optimization Opportunities In Computer Programs, Delaram Talaashrafi Dec 2022

Advances In The Automatic Detection Of Optimization Opportunities In Computer Programs, Delaram Talaashrafi

Electronic Thesis and Dissertation Repository

Massively parallel and heterogeneous systems together with their APIs have been used for various applications. To achieve high-performance software, the programmer should develop optimized algorithms to maximize the system’s resource utilization. However, designing such algorithms is challenging and time-consuming. Therefore, optimizing compilers are developed to take part in the programmer’s optimization burden. Developing effective optimizing compilers is an active area of research. Specifically, because loop nests are usually the hot spots in a program, their optimization has been the main subject of many optimization algorithms. This thesis aims to improve the scope and applicability of performance optimization algorithms used in …


Three Contributions To The Theory And Practice Of Optimizing Compilers, Linxiao Wang Nov 2022

Three Contributions To The Theory And Practice Of Optimizing Compilers, Linxiao Wang

Electronic Thesis and Dissertation Repository

The theory and practice of optimizing compilers gather techniques that, from input computer programs, aim at generating code making the best use of modern computer hardware. On the theory side, this thesis contributes new results and algorithms in polyhedral geometry. On the practical side, this thesis contributes techniques for the tuning of parameters of programs targeting GPUs. We detailed these two fronts of our work below.

Consider a convex polyhedral set P given by a system of linear inequalities A*x <= b, where A is an integer matrix and b is an integer vector. We are interested in the integer hull PI of P which is the smallest convex polyhedral set that contains all the integer points in P. In Chapter …


Resource Bound Guarantees Via Programming Languages, Michael J. Burrell Jun 2017

Resource Bound Guarantees Via Programming Languages, Michael J. Burrell

Electronic Thesis and Dissertation Repository

We present a programming language in which every well-typed program halts in time polynomial with respect to its input and, more importantly, in which upper bounds on resource requirements can be inferred with certainty. Ensuring that software meets its resource constraints is important in a number of domains, most prominently in hard real-time systems and safety critical systems where failing to meet its time constraints can result in catastrophic failure. The use of test- ing in ensuring resource constraints is of limited use since the testing of every input or environment is impossible in general. Static analysis, whether via the …


Metafork: A Compilation Framework For Concurrency Models Targeting Hardware Accelerators, Xiaohui Chen Mar 2017

Metafork: A Compilation Framework For Concurrency Models Targeting Hardware Accelerators, Xiaohui Chen

Electronic Thesis and Dissertation Repository

Parallel programming is gaining ground in various domains due to the tremendous computational power that it brings; however, it also requires a substantial code crafting effort to achieve performance improvement. Unfortunately, in most cases, performance tuning has to be accomplished manually by programmers. We argue that automated tuning is necessary due to the combination of the following factors. First, code optimization is machine-dependent. That is, optimization preferred on one machine may be not suitable for another machine. Second, as the possible optimization search space increases, manually finding an optimized configuration is hard. Therefore, developing new compiler techniques for optimizing applications …


Towards Comprehensive Parametric Code Generation Targeting Graphics Processing Units In Support Of Scientific Computation, Ning Xie Nov 2016

Towards Comprehensive Parametric Code Generation Targeting Graphics Processing Units In Support Of Scientific Computation, Ning Xie

Electronic Thesis and Dissertation Repository

The most popular multithreaded languages based on the fork-join concurrency model (CIlkPlus, OpenMP) are currently being extended to support other forms of parallelism (vectorization, pipelining and single-instruction-multiple-data (SIMD)). In the SIMD case, the objective is to execute the corresponding code on a many-core device, like a GPGPU, for which the CUDA language is a natural choice. Since the programming concepts of CilkPlus and OpenMP are very different from those of CUDA, it is desirable to automatically generate optimized CUDA-like code from CilkPlus or OpenMP.

In this thesis, we propose an accelerator model for annotated C/C++ code together with an implementation …


On The Interoperability Of Programming Languages Based On The Fork-Join Parallelism Model, Sushek Shekar Dec 2013

On The Interoperability Of Programming Languages Based On The Fork-Join Parallelism Model, Sushek Shekar

Electronic Thesis and Dissertation Repository

This thesis describes the implementation of MetaFork, a meta-language for concurrency platforms targeting multicore architectures. First of all, MetaFork is a multithreaded language based on the fork-join model of concurrency: it allows the programmer to express parallel algorithms assuming that tasks are dynamically scheduled at run-time. While MetaFork makes no assumption about the run-time system, it formally defines the serial C-elision of a MetaFork program. In addition, MetaFork is a suite of source-to-source compilers permitting the automatic translation of multithreaded programs between programming languages based on the fork-join model. Currently, this compilation framework supports the OpenMP and CilkPlus concurrency platforms. …


Integrated Development And Parallelization Of Automated Dicentric Chromosome Identification Software To Expedite Biodosimetry Analysis, Yanxin Li Apr 2013

Integrated Development And Parallelization Of Automated Dicentric Chromosome Identification Software To Expedite Biodosimetry Analysis, Yanxin Li

Electronic Thesis and Dissertation Repository

Manual cytogenetic biodosimetry lacks the ability to handle mass casualty events. We present an automated dicentric chromosome identification (ADCI) software utilizing parallel computing technology. A parallelization strategy combining data and task parallelism, as well as optimization of I/O operations, has been designed, implemented, and incorporated in ADCI. Experiments on an eight-core desktop show that our algorithm can expedite the process of ADCI by at least four folds. Experiments on Symmetric Computing, SHARCNET, Blue Gene/Q multi-processor computers demonstrate the capability of parallelized ADCI to process thousands of samples for cytogenetic biodosimetry in a few hours. This increase in speed underscores the …