Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Parallel Computing

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 37

Full-Text Articles in Physical Sciences and Mathematics

Hpc-Enabled Fast And Configurable Dynamic Simulation, Analysis, And Learning For Complex Power System Adaptation And Control, Cong Wang Dec 2023

Hpc-Enabled Fast And Configurable Dynamic Simulation, Analysis, And Learning For Complex Power System Adaptation And Control, Cong Wang

All Dissertations

This dissertation presents an HPC-enabled fast and configurable dynamic simulation, analysis, and learning framework for complex power system adaptation and control. Dynamic simulation for a large transmission system comprising thousands of buses and branches implies the latency of complicated numerical computations. However, faster-than-real-time execution is often required to provide timely support for power system planning and operation. The traditional approaches for speeding up the simulation demand extensive computing facilities such as CPU-based multi-core supercomputers, resulting in heavily resource-dependent solutions. In this work, by coupling the Message Passing Interface (MPI) protocol with an advanced heterogeneous programming environment, further acceleration can be …


Unoapi: Balancing Performance, Portability, And Productivity (P3) In Hpc Education, Konstantin Laufer, George K. Thiruvathukal Nov 2022

Unoapi: Balancing Performance, Portability, And Productivity (P3) In Hpc Education, Konstantin Laufer, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

oneAPI is a major initiative by Intel aimed at making it easier to program heterogeneous architectures used in high-performance computing using a unified application programming interface (API). While raising the abstraction level via a unified API represents a promising step for the current generation of students and practitioners to embrace high- performance computing, we argue that a curriculum of well- developed software engineering methods and well-crafted exem- plars will be necessary to ensure interest by this audience and those who teach them. We aim to bridge the gap by developing a curriculum—codenamed UnoAPI—that takes a more holistic approach by looking …


Novel Hybrid Resampling Algorithms For Parallel/Distributed Particle Filters, Xudong Zhang Sep 2021

Novel Hybrid Resampling Algorithms For Parallel/Distributed Particle Filters, Xudong Zhang

Dissertations, Theses, and Capstone Projects

Particle filters, also known as sequential Monte Carlo (SMC) methods, use the Bayesian inference and the stochastic sampling technique to estimate the states of dynamic systems from given observations. Parallel/Distributed particle filters were introduced to improve the performance of sequential particle filters by using multiple processing units (PUs). The classical resampling algorithm used in parallel/distributed particle filters is a centralized scheme, called centralized resampling, which needs a central unit (CU) to serve as a hub for data transfers. As a result, the centralized resampling procedures produce extra communication costs, which lowers the speedup factors in parallel computing. Even though some …


Optimal Communication Structures For Concurrent Computing, Andrii Berdnikov May 2021

Optimal Communication Structures For Concurrent Computing, Andrii Berdnikov

Doctoral Dissertations

This research focuses on communicative solvers that run concurrently and exchange information to improve performance. This “team of solvers” enables individual algorithms to communicate information regarding their progress and intermediate solutions, and allows them to synchronize memory structures with more “successful” counterparts. The result is that fewer nodes spend computational resources on “struggling” processes. The research is focused on optimization of communication structures that maximize algorithmic efficiency using the theoretical framework of Markov chains. Existing research addressing communication between the cooperative solvers on parallel systems lacks generality: Most studies consider a limited number of communication topologies and strategies, while the …


Graphmp: I/O-Efficient Big Graph Analytics On A Single Commodity Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao Dec 2020

Graphmp: I/O-Efficient Big Graph Analytics On A Single Commodity Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge …


System Support Of Concurrent Database Query Processing On A Gpu, Hao Li Nov 2020

System Support Of Concurrent Database Query Processing On A Gpu, Hao Li

USF Tampa Graduate Theses and Dissertations

The unrivaled computing capabilities of modern GPUs meet the demand of processing massive amounts of data seen in many application domains. While traditional HPC systems support applications as standalone entities that occupy the entire GPU, we propose a GPU-based DBMS (G-DBMS) that can run multiple tasks concurrently. To that end, system-level management mechanisms like resource allocation and buffer manager are needed to build such a concurrent database query processing system and fully unleash the GPUs’ computing power. However, CUDA does not provide enough OS-level functionalities to support it. Thus our research is focusing on implementing the optimization of resource allocation …


A Generic Implementation Of Fast Fourier Transforms For The Bpas Library, Colin S. Costello Aug 2020

A Generic Implementation Of Fast Fourier Transforms For The Bpas Library, Colin S. Costello

Electronic Thesis and Dissertation Repository

In this thesis we seek to realize an efficient implementation of a generic parallel fast Fourier transform (FFT). The FFT will be used in support of fast multiplication of polynomials with coefficients in a finite field. Our goal is to obtain a relatively high performing parallel implementation that will run over a variety of finite fields with different sized characteristic primes. To this end, we implement and compare two Cooley-Tukey Six-Step fast Fourier transforms and a Cooley-Tukey Four-Step variant against a high performing specialized FFT already implemented in the Basic Polynomial Algebra Subprograms (BPAS) library. We use optimization techniques found …


Advanced Parallel Algorithms In Computational Electromagnetics, Shu Wang Jul 2020

Advanced Parallel Algorithms In Computational Electromagnetics, Shu Wang

Electrical and Computer Engineering ETDs

The rapid development of high performance computing has pushed the computational electromagnetic(CEM) towards high accuracy, high fidelity and extreme computational scales. There is a great need for existing CEM solvers to have enhanced parallelism and scaling capability. The purpose of this dissertation is to investigate advanced parallel algorithms for both frequency and time domain solvers.

In frequency domain, this work first develop the underpinnings of parallel preconditioning technique and high-order transmission condition in the context of multi-solver scheme. The result is a computing resource-aware and implementation wise compact solver. Then this work targeted at developing efficient algorithms for cases where …


Relational Joins On Gpus For In-Memory Database Query Processing, Ran Rui Jun 2020

Relational Joins On Gpus For In-Memory Database Query Processing, Ran Rui

USF Tampa Graduate Theses and Dissertations

Relational join processing is one of the core functionalities in database management systems. Implementing join algorithms on parallel platforms, especially modern GPUs, has gain a lot of momentum in the past decade. This dissertation addresses the following issues on GPU join algorithms. First, we present empirical evaluations of a state-of-the-art work on GPU-based join processing. Since 2008, the compute capabilities of GPUs have increased following a pace faster than that of the multi-core CPUs. We run a comprehensive set of experiments to study how join operations can benefit from such rapid expansion of GPU capabilities. We also present improved GPU …


Algorithms And Framework For Computing 2-Body Statistics On Graphics Processing Units, Napath Pitaksirianan Feb 2020

Algorithms And Framework For Computing 2-Body Statistics On Graphics Processing Units, Napath Pitaksirianan

USF Tampa Graduate Theses and Dissertations

Various types of two-body statistics (2-BS) are regarded as essential components of low-level data analysis in scientific database systems. In relational algebraic terms, a 2-BS is essentially a Cartesian product between two datasets (or two instances of the same dataset) followed by a user-defined aggregate. The quadratic complexity of these computations hinders the timely processing of data. Thus using modern parallel hardware has become an obvious solution to meet such challenges. This dissertation presents our recent work in designing and optimizing parallel algorithms for 2-BS computation on Graphics Processing Units (GPUs). The unique architecture, however, provides abundant opportunities for optimizing …


A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong May 2019

A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong

Graduate Theses and Dissertations

Because earthquakes have a large impact on human society, statistical methods for better studying earthquakes are required. One characteristic of earthquakes is the arrival time of seismic waves at a seismic signal sensor. Once we can estimate the earthquake arrival time accurately, the earthquake location can be triangulated, and assistance can be sent to that area correctly. This study presents a Bayesian framework to predict the arrival time of seismic waves with associated uncertainty. We use a change point framework to model the different conditions before and after the seismic wave arrives. To evaluate the performance of the model, we …


Three Environmental Fluid Dynamics Papers, Eden Furtak-Cole May 2018

Three Environmental Fluid Dynamics Papers, Eden Furtak-Cole

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Three papers are presented, applying computational fluid dynamics methods to fluid flows in the geosciences. In the first paper, a numerical method is developed for single phase potential flow in the subsurface. For a class of monotonically advancing flows, the method provides a computational savings as compared to classical methods and can be applied to problems such as forced groundwater recharge. The second paper investigates the shear stress reducing action of an erosion control roughness array. Incompressible Naiver-Stokes simulations are performed for multiple wind angles to understand the changing aerodynamics of individual and grouped roughness elements. In the third paper, …


Parallelizing Tabu Search Based Optimization Algorithm On Gpus, Vinaya Malleypally Mar 2018

Parallelizing Tabu Search Based Optimization Algorithm On Gpus, Vinaya Malleypally

USF Tampa Graduate Theses and Dissertations

There are many combinatorial optimization problems such as traveling salesman problem, quadratic-assignment problem, flow shop scheduling, that are computationally intractable. Tabu search based simulated annealing is a stochastic search algorithm that is widely used to solve combinatorial optimization problems. Due to excessive run time, there is a strong demand for a parallel version that can be applied to any problem with minimal modifications. Existing advanced and/or parallel versions of tabu search algorithms are specific to the problem at hand. This leads to a drawback of optimization only for that particular problem. In this work, we propose a parallel version of …


Graphmp: An Efficient Semi-External-Memory Big Graph Processing System On A Single Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao Dec 2017

Graphmp: An Efficient Semi-External-Memory Big Graph Processing System On A Single Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Recent studies showed that single-machine graph processing systems can be as highly competitive as clusterbased approaches on large-scale problems. While several outof-core graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge …


Accelerating The Discontinuous Galerkin Cell-Vertex Scheme (Dg-Cvs) Solver On Cpu-Gpu Heterogeneous Systems, Xiaoqi Hu Jan 2017

Accelerating The Discontinuous Galerkin Cell-Vertex Scheme (Dg-Cvs) Solver On Cpu-Gpu Heterogeneous Systems, Xiaoqi Hu

Electronic Theses and Dissertations

Dg-Cvs (Discontinuous Galerkin Cell-Vertex Scheme) is an efficient, accurate and robust numerical solver for general hyperbolic conservation laws. It can solve a broad range of conservation laws such as the shallow water equation and Magnetohydrodynamics equations. Dg-Cvs is a Riemann-Solver-free high order space-time method for arbitrary space conservation laws. It fuses the discontinuous Galerkin (dg) method and the conservation element/solution element (ce/se) method to take advantage of the best features of both methods. Thanks to the ce/se method, the time derivative of the solution is treated as an independent unknown, which is amendable to gpu's parallel execution. In this thesis, …


Definition Of A Method For The Formulation Of Problems To Be Solved With High Performance Computing, Ramya Peruri Aug 2016

Definition Of A Method For The Formulation Of Problems To Be Solved With High Performance Computing, Ramya Peruri

Master of Science in Computer Science Theses

Computational power made available by current technology has been continuously increasing, however today’s problems are larger and more complex and demand even more computational power. Interest in computational problems has also been increasing and is an important research area in computer science. These complex problems are solved with computational models that use an underlying mathematical model and are solved using computer resources, simulation, and are run with High Performance Computing. For such computations, parallel computing has been employed to achieve high performance. This thesis identifies families of problems that can best be solved using modelling and implementation techniques of parallel …


Efficient Algorithms And Applications In Topological Data Analysis, Junyi Tu Jul 2016

Efficient Algorithms And Applications In Topological Data Analysis, Junyi Tu

USF Tampa Graduate Theses and Dissertations

Topological Data Analysis (TDA) is a new and fast growing research field developed over last two decades. TDA finds many applications in computer vision, computer graphics, scientific visualization, molecular biology, and material science, to name a few. In this dissertation, we make algorithmic and application contributions to three data structures in TDA: contour trees, Reeb graphs, and Mapper. From the algorithmic perspective, we design a parallel algorithm for contour tree construction and implement it in OpenCL. We also design and implement critical point pairing algorithms to compute persistence diagrams directly from contour trees, Reeb graphs, and Mapper. In terms of …


A Dynamic Run-Profile Energy-Aware Approach For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali Jul 2016

A Dynamic Run-Profile Energy-Aware Approach For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali

Computer Science Faculty Proceedings & Presentations

High Performance Computing (HPC) resources are housed in large datacenters, which consume exorbitant amounts of energy and are quickly demanding attention from businesses as they result in high operating costs. On the other hand HPC environments have been very useful to researchers in many emerging areas in life sciences such as Bioinformatics and Medical Informatics. In an earlier work, we introduced a dynamic model for energy aware scheduling (EAS) in a HPC environment; the model is domain agnostic and incorporates both the deadline parameter as well as energy parameters for computationally intensive applications. Our proposed EAS model incorporates 2-phases. In …


Enforcing Security Policies On Gpu Computing Through The Use Of Aspect-Oriented Programming Techniques, Bader Albassam Jun 2016

Enforcing Security Policies On Gpu Computing Through The Use Of Aspect-Oriented Programming Techniques, Bader Albassam

USF Tampa Graduate Theses and Dissertations

This thesis presents a new security policy enforcer designed for securing parallel computation on CUDA GPUs. We show how the very features that make a GPGPU desirable have already been utilized in existing exploits, fortifying the need for security protections on a GPGPU. An aspect weaver was designed for CUDA with the goal of utilizing aspect-oriented programming for security policy enforcement. Empirical testing verified the ability of our aspect weaver to enforce various policies. Furthermore, a performance analysis was performed to demonstrate that using this policy enforcer provides no significant performance impact over manual insertion of policy code. Finally, future …


Large-Scale Spatial Data Management On Modern Parallel And Distributed Platforms, Simin You Feb 2016

Large-Scale Spatial Data Management On Modern Parallel And Distributed Platforms, Simin You

Dissertations, Theses, and Capstone Projects

Rapidly growing volume of spatial data has made it desirable to develop efficient techniques for managing large-scale spatial data. Traditional spatial data management techniques cannot meet requirements of efficiency and scalability for large-scale spatial data processing. In this dissertation, we have developed new data-parallel designs for large-scale spatial data management that can better utilize modern inexpensive commodity parallel and distributed platforms, including multi-core CPUs, many-core GPUs and computer clusters, to achieve both efficiency and scalability. After introducing background on spatial data management and modern parallel and distributed systems, we present our parallel designs for spatial indexing and spatial join query …


Towards The Scalability And Hybrid Parallelization Of A Spatially Variant Lattice Algorithm, Henry Roger Moncada Lopez Jan 2016

Towards The Scalability And Hybrid Parallelization Of A Spatially Variant Lattice Algorithm, Henry Roger Moncada Lopez

Open Access Theses & Dissertations

The purpose of this research is to design a faster implementation of the spatially variant algorithm that improves its performance when it is running on a parallel computer system.

The spatially variant algorithm is used to synthesize a spatially variant lattice for a periodic electromagnetic structure. The algorithm has the ability to spatially vary the unit cell orientation and exploit its directional dependencies. The algorithm produces a lattice that is smooth, continuous and free of defects. The lattice spacing remains strikingly uniform when the unit cell orientation, lattice spacing, fill fraction and more are spatially varied. This is important for …


Sparse Matrix Diagonalization In The Nrlmol Electronic Structure Code, Md Mahmudulla Hassan Jan 2016

Sparse Matrix Diagonalization In The Nrlmol Electronic Structure Code, Md Mahmudulla Hassan

Open Access Theses & Dissertations

Density functional theory (DFT) based simulations are playing a major role in quantum mechanical studies of materials ranging from molecules, nanoparticles to the biological systems as they offer insights that are not directly accessible from experiments and also due to their ability to make sufficiently accurate predictions. The DFT implementation in the NRLMOL electronic structure code employs Gaussian basis sets to express the Kohn-Sham orbitals. A major computationally demanding task in the electronic structure calculations is solution of the generalized eigenvalue problem, that is the determination of nontrivial solutions (λ, c) of Hc = λOc where H and O are …


Towards Real-Time, On-Board, Hardware-Supported Sensor And Software Health Management For Unmanned Aerial Systems, Johann M. Schumann, Kristin Y. Rozier, Thomas Reinbacher, Ole J. Mengshoel, Timmy Mbaya, Corey Ippolito Jun 2015

Towards Real-Time, On-Board, Hardware-Supported Sensor And Software Health Management For Unmanned Aerial Systems, Johann M. Schumann, Kristin Y. Rozier, Thomas Reinbacher, Ole J. Mengshoel, Timmy Mbaya, Corey Ippolito

Ole J Mengshoel

For unmanned aerial systems (UAS) to be successfully deployed and integrated within the national airspace, it is imperative that they possess the capability to effectively complete their missions without compromising the safety of other aircraft, as well as persons and property on the ground. This necessity creates a natural requirement for UAS that can respond to uncertain environmental conditions and emergent failures in real-time, with robustness and resilience close enough to those of manned systems. We introduce a system that meets this requirement with the design of a real-time onboard system health management (SHM) capability to continuously monitor sensors, software, …


Parallel Design Patterns And Program Performance, Yu Zhao May 2014

Parallel Design Patterns And Program Performance, Yu Zhao

Mathematics, Statistics, and Computer Science Honors Projects

With the rapid advancement of parallel and distributed computing (PDC), three types of hardware and their corresponding software (hardware-software pairs) are becoming more and more popular: Distributed Memory Systems with the Message Passing Interface (MPI) library, Shared Memory Systems with the OpenMP library and Co-processor Systems with a general purpose parallel computing library. Alongside the development of both hardware and software aspects of PDC, the process of designing parallel programs has also improved significantly over the years. A consequence of this is that researchers have been able to describe many parallel design patterns, which are recurring solutions to well-known problems …


The Design, Analysis, & Application Of Multi-Modal Real-Time Embedded Systems, Masud Ahmed Jan 2014

The Design, Analysis, & Application Of Multi-Modal Real-Time Embedded Systems, Masud Ahmed

Wayne State University Dissertations

For many hand-held computing devices (e.g., smartphones), multiple operational modes are preferred because of their flexibility. In addition to their designated purposes, some of these devices provide a platform for different types of services, which include rendering of high-quality multimedia. Upon such devices, temporal isolation among co-executing applications is very important to ensure that each application receives an acceptable level of quality-of-service. In order to provide strong guarantees on services, multimedia applications and real-time control systems maintain timing constraints in the form of deadlines for recurring tasks. A flexible real-time multi-modal system will ideally provide system designers the option to …


Towards Real-Time, On-Board, Hardware-Supported Sensor And Software Health Management For Unmanned Aerial Systems, Johann Schumann, Kristin Y. Rozier, Thomas Reinbacher, Ole J. Mengshoel, Timmy Mbaya, Corey Ippolito Sep 2013

Towards Real-Time, On-Board, Hardware-Supported Sensor And Software Health Management For Unmanned Aerial Systems, Johann Schumann, Kristin Y. Rozier, Thomas Reinbacher, Ole J. Mengshoel, Timmy Mbaya, Corey Ippolito

Ole J Mengshoel

Unmanned aerial systems (UASs) can only be deployed if they can effectively complete their missions and respond to failures and uncertain environmental conditions while maintaining safety with respect to other aircraft as well as humans and property on the ground. In this paper, we design a real-time, on-board system health management (SHM) capability to continuously monitor sensors, software, and hardware components for detection and diagnosis of failures and violations of safety or performance rules during the flight of a UAS. Our approach to SHM is three-pronged, providing: (1) real-time monitoring of sensor and/or software signals; (2) signal analysis, preprocessing, and …


Opencuda+Mpi, Kenny Ballou, Nilab Mohammad Mousa Aug 2013

Opencuda+Mpi, Kenny Ballou, Nilab Mohammad Mousa

Student Research Initiative

The introduction and rise of General Purpose Graphics Computing has significantly impacted parallel and high-performance computing. It has introduced challenges when it comes to distributed computing with GPUs. Current solutions target specifics: specific hardware, specific network topology, a specific level of processing. Those restrictions on GPU computing limit scientists and researchers in various ways. The goal of OpenCUDA+MPI project is to develop a framework that allows researchers and scientists to write a general algorithm without the overhead of worrying about the specifics of the hardware and the cluster it will run against while taking full advantage of parallel and distributed …


Optimizing Parallel Belief Propagation In Junction Trees Using Regression, Lu Zheng, Ole J. Mengshoel Jul 2013

Optimizing Parallel Belief Propagation In Junction Trees Using Regression, Lu Zheng, Ole J. Mengshoel

Ole J Mengshoel

The junction tree approach, with applications in artificial intelligence, computer vision, machine learning, and statistics, is often used for computing posterior distributions in probabilistic graphical models. One of the key challenges associated with junction trees is computational, and several parallel computing technologies - including many-core processors - have been investigated to meet this challenge. Many-core processors (including GPUs) are now programmable, unfortunately their complexities make it hard to manually tune their parameters in order to optimize software performance. In this paper, we investigate a machine learning approach to minimize the execution time of parallel junction tree algorithms implemented on a …


Exploring Multiple Dimensions Of Parallelism In Junction Tree Message Passing, Lu Zheng, Ole J. Mengshoel Jun 2013

Exploring Multiple Dimensions Of Parallelism In Junction Tree Message Passing, Lu Zheng, Ole J. Mengshoel

Ole J Mengshoel

Belief propagation over junction trees is known to be computationally challenging in the general case. One way of addressing this computational challenge is to use node-level parallel computing, and parallelize the computation associated with each separator potential table cell. However, this approach is not efficient for junction trees that mainly contain small separators. In this paper, we analyze this problem, and address it by studying a new dimension of node-level parallelism, namely arithmetic parallelism. In addition, on the graph level, we use a clique merging technique to further adapt junction trees to parallel computing platforms. We apply our parallel approach …


Scaling Bayesian Network Parameter Learning With Expectation Maximization Using Mapreduce, Erik B. Reed, Ole J. Mengshoel Nov 2012

Scaling Bayesian Network Parameter Learning With Expectation Maximization Using Mapreduce, Erik B. Reed, Ole J. Mengshoel

Ole J Mengshoel

Bayesian network (BN) parameter learning from incomplete data can be a computationally expensive task for incomplete data. Applying the EM algorithm to learn BN parameters is unfortunately susceptible to local optima and prone to premature convergence. We develop and experiment with two methods for improving EM parameter learning by using MapReduce: Age-Layered Expectation Maximization (ALEM) and Multiple Expectation Maximization (MEM). Leveraging MapReduce for distributed machine learning, these algorithms (i) operate on a (potentially large) population of BNs and (ii) partition the data set as is traditionally done with MapReduce machine learning. For example, we achieved gains using the Hadoop implementation …