Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

MPI

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 41

Full-Text Articles in Physical Sciences and Mathematics

Tools For Biomolecular Modeling And Simulation, Xin Yang Apr 2024

Tools For Biomolecular Modeling And Simulation, Xin Yang

Mathematics Theses and Dissertations

Electrostatic interactions play a pivotal role in understanding biomolecular systems, influencing their structural stability and functional dynamics. The Poisson-Boltzmann (PB) equation, a prevalent implicit solvent model that treats the solvent as a continuum while describes the mobile ions using the Boltzmann distribution, has become a standard tool for detailed investigations into biomolecular electrostatics. There are two primary methodologies: grid-based finite difference or finite element methods and body-fitted boundary element methods. This dissertation focuses on developing fast and accurate PB solvers, leveraging both methodologies, to meet diverse scientific needs and overcome various obstacles in the field.


Parallelized Quadtrees For Image Compression In Cuda And Mpi, Aidan Jones Apr 2024

Parallelized Quadtrees For Image Compression In Cuda And Mpi, Aidan Jones

Senior Honors Theses

Quadtrees are a data structure that lend themselves well to image compression due to their ability to recursively decompose 2-dimensional space. Image compression algorithms that use quadtrees should be simple to parallelize; however, current image compression algorithms that use quadtrees rarely use parallel algorithms. An existing program to compress images using quadtrees was upgraded to use GPU acceleration with CUDA but experienced an average slowdown by a factor of 18 to 42. Another parallelization attempt utilized MPI to process contiguous chunks of an image in parallel and experienced an average speedup by a factor of 1.5 to 3.7 compared to …


Optimizing Collective Communication For Scalable Scientific Computing And Deep Learning, Jiali Li Aug 2023

Optimizing Collective Communication For Scalable Scientific Computing And Deep Learning, Jiali Li

Doctoral Dissertations

In the realm of distributed computing, collective operations involve coordinated communication and synchronization among multiple processing units, enabling efficient data exchange and collaboration. Scientific applications, such as simulations, computational fluid dynamics, and scalable deep learning, require complex computations that can be parallelized across multiple nodes in a distributed system. These applications often involve data-dependent communication patterns, where collective operations are critical for achieving high performance in data exchange. Optimizing collective operations for scientific applications and deep learning involves improving the algorithms, communication patterns, and data distribution strategies to minimize communication overhead and maximize computational efficiency.

Within the context of this …


Quantum Simulation Using High-Performance Computing, Collin Beaudoin, Christian Trefftz, Zachary Kurmas Apr 2021

Quantum Simulation Using High-Performance Computing, Collin Beaudoin, Christian Trefftz, Zachary Kurmas

Masters Theses

Hermitian matrix multiplication is one of the most common actions that is performed on quantum matrices, for example, it is used to apply observables onto a given state vector/density matrix.

ρ→Hρ

Our goal is to create an algorithm to perform the matrix multiplication within the constraints of QuEST [1], a high-performance simulator for quantum circuits. QuEST provides a system-independent platform for implementing and simulating quantum algorithms without the need for access to quantum machines. The current implementation of QuEST supports CUDA, MPI, and OpenMP, which allows programs to run on a wide variety of systems.


Statistical Modeling Of Hpc Performance Variability And Communication, Jered B. Dominguez-Trujillo Jan 2021

Statistical Modeling Of Hpc Performance Variability And Communication, Jered B. Dominguez-Trujillo

Computer Science ETDs

Understanding the performance of parallel and distributed programs remains a focal point in determining how compute systems can be optimized to achieve exascale performance. Lightweight, statistical models allow developers to both characterize and predict performance trade-offs, especially as HPC systems become more heterogeneous with many-core CPUs and GPUs. This thesis presents a lightweight, statistical modeling approach of performance variation which leverages extreme value theory by focusing on the maximum length of distributed workload intervals. This approach was implemented in MPI and evaluated on several HPC systems and workloads. I then present a performance model of partitioned communication which also uses …


Scalable Community Detection Using Distributed Louvain Algorithm, Naw Safrin Sattar May 2019

Scalable Community Detection Using Distributed Louvain Algorithm, Naw Safrin Sattar

University of New Orleans Theses and Dissertations

Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in …


Adaptive Parallelism For Coupled, Multithreaded Message-Passing Programs, Samuel K. Gutiérrez Dec 2018

Adaptive Parallelism For Coupled, Multithreaded Message-Passing Programs, Samuel K. Gutiérrez

Computer Science ETDs

Hybrid parallel programming models that combine message passing (MP) and shared- memory multithreading (MT) are becoming more popular, especially with applications requiring higher degrees of parallelism and scalability. Consequently, coupled parallel programs, those built via the integration of independently developed and optimized software libraries linked into a single application, increasingly comprise message-passing libraries with differing preferred degrees of threading, resulting in thread-level heterogeneity. Retroactively matching threading levels between independently developed and maintained libraries is difficult, and the challenge is exacerbated because contemporary middleware services provide only static scheduling policies over entire program executions, necessitating suboptimal, over-subscribed or under-subscribed, configurations. In …


A Parallel Spectral Method Approach To Model Plasma Instabilities, Kevin S. Scheiman Jan 2018

A Parallel Spectral Method Approach To Model Plasma Instabilities, Kevin S. Scheiman

Browse all Theses and Dissertations

The study of solar-terrestrial plasma is concerned with processes in magnetospheric, ionospheric, and cosmic-ray physics involving different particle species and even particles of different energy within a single species. Instabilities in space plasmas and the earth's atmosphere are driven by a multitude of free energy sources such as velocity shear, gravity, temperature anisotropy, electron, and, ion beams and currents. Microinstabilities such as Rayleigh-Taylor and Kelvin-Helmholtz instabilities are important for the understanding of plasma dynamics in presence of magnetic field and velocity shear. Modeling these turbulences is a computationally demanding processes; requiring large memory and suffer from excessively long runtimes. Previous …


Improving Hpc Communication Library Performance On Modern Architectures, Matthew G. F. Dosanjh Oct 2017

Improving Hpc Communication Library Performance On Modern Architectures, Matthew G. F. Dosanjh

Computer Science ETDs

As high-performance computing (HPC) systems advance towards exascale (10^18 operations per second), they must leverage increasing levels of parallelism to achieve their performance goals. In addition to increased parallelism, machines of that scale will have strict power limitations placed on them. One direction currently being explored to alleviate those issues are many-core processors such as Intel’s Xeon Phi line. Many-core processors sacrifice clock speed and core complexity, such as out of order pipelining, to increase the number of cores on a die. While this increases floating point throughput, it can reduce the performance of serialized, synchronized, and latency sensitive code …


Modelling Parallel Overhead From Simple Run-Time Records, Siegfried Höfinger, Ernst Haunschmid Oct 2017

Modelling Parallel Overhead From Simple Run-Time Records, Siegfried Höfinger, Ernst Haunschmid

Michigan Tech Publications

A number of scientific applications run on current HPC systems would benefit from an approximate assessment of parallel overhead. In many instances a quick and simple method to obtain a general overview on the subject is regarded useful auxiliary information by the routine HPC user. Here we present such a method using just execution times for increasing numbers of parallel processing cores. We start out with several common scientific applications and measure the fraction of time spent in MPI communication. Forming the ratio of MPI time to overall execution time we obtain a smooth curve that can be parameterized by …


A Distributed Graph Approach For Pre-Processing Linked Rdf Data Using Supercomputers, Michael J. Lewis, George K. Thiruvathukal, Venkatram Vishwanath, Michael J. Papka, Andrew Johnson Jul 2017

A Distributed Graph Approach For Pre-Processing Linked Rdf Data Using Supercomputers, Michael J. Lewis, George K. Thiruvathukal, Venkatram Vishwanath, Michael J. Papka, Andrew Johnson

George K. Thiruvathukal

Efficient RDF, graph based queries are becoming more pertinent based on the increased interest in data analytics and its intersection with large, unstructured but connected data. Many commercial systems have adopted distributed RDF graph systems in order to handle increasing dataset sizes and complex queries. This paper introduces a distribute graph approach to pre-processing linked data. Instead of traversing the memory graph, our system indexes pre-processed join elements that are organized in a graph structure. We analyze the Dbpedia data-set (derived from the Wikipedia corpus) and compare our access method to the graph traversal access approach which we also devise. …


Patternlets — A Teaching Tool For Introducing Students To Parallel Design Patterns, Joel C. Adams Jul 2017

Patternlets — A Teaching Tool For Introducing Students To Parallel Design Patterns, Joel C. Adams

University Faculty Publications and Creative Works

Thanks to the ubiquity of multicore processors, today's CS students must be introduced to parallel computing or they will be ill prepared as modern software developers. Professional developers of parallel software think in terms of parallel design patterns, which are markedly different from traditional (sequential) design patterns. It follows that the more we can teach students to think in terms of parallel patterns, the more their thinking will resemble that of parallel software professionals. In this paper, we present patternlets—minimalist, scalable, syntactically correct programs, each designed to introduce students to a particular parallel design pattern. The collection currently includes 44 …


A Distributed Graph Approach For Pre-Processing Linked Rdf Data Using Supercomputers, Michael J. Lewis, George K. Thiruvathukal, Venkatram Vishwanath, Michael J. Papka, Andrew Johnson May 2017

A Distributed Graph Approach For Pre-Processing Linked Rdf Data Using Supercomputers, Michael J. Lewis, George K. Thiruvathukal, Venkatram Vishwanath, Michael J. Papka, Andrew Johnson

Computer Science: Faculty Publications and Other Works

Efficient RDF, graph based queries are becoming more pertinent based on the increased interest in data analytics and its intersection with large, unstructured but connected data. Many commercial systems have adopted distributed RDF graph systems in order to handle increasing dataset sizes and complex queries. This paper introduces a distribute graph approach to pre-processing linked data. Instead of traversing the memory graph, our system indexes pre-processed join elements that are organized in a graph structure. We analyze the Dbpedia data-set (derived from the Wikipedia corpus) and compare our access method to the graph traversal access approach which we also devise. …


Programming Models' Support For Heterogeneous Architecture, Wei Wu May 2017

Programming Models' Support For Heterogeneous Architecture, Wei Wu

Doctoral Dissertations

Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak computational capacity. Heterogeneous systems equipped with accelerators such as GPUs have become the most prominent components of High Performance Computing (HPC) systems. Even at the node level the significant heterogeneity of CPU and GPU, i.e. hardware and memory space differences, leads to challenges for fully exploiting such complex architectures. Extending outside the node scope, only escalate such challenges.

Conventional programming models such as data- ow and message passing have been widely adopted in HPC communities. When moving towards heterogeneous systems, the lack of GPU integration causes …


An Analyzer For Message Passing Programs, Yu Huang May 2016

An Analyzer For Message Passing Programs, Yu Huang

Theses and Dissertations

Asynchronous message passing systems are fast becoming a common means for communication between devices. Two problems existing in message passing programs are difficult to solve. The first problem, intended or otherwise, is message-race where a receive may match with more than one send in the runtime system. This non-determinism often leads to intermittent and unexpected behavior depending on the resolution of the race. Another problem is deadlock, which is a situation in that each member process of the group is waiting for some member process to communicate with it, but no member is attempting to communicate with it. Detecting if …


Parallel Static Object Detection, Tirambad Shiwakoti May 2016

Parallel Static Object Detection, Tirambad Shiwakoti

UNLV Theses, Dissertations, Professional Papers, and Capstones

The need for parallelism is growing with the broadening of computing in the real world where computing is an integral part of any field. In the early days of computing, adding transistors to the CPU could solve computation complexity. This is not the case now, where we can no longer advance the hardware capabilities at the pace of the advancement of computing problems. One of the fields which is intensive in computation is image processing. If it were just for one frame of an image, we could cope with the computation overhead. When the need is to compute video frames, …


A Two Phase Numerical Model For The Water Injection Dredging (Wid) Technology: An Unified Formulation For Continuum Mechanic, Dan Nguyen, Miguel Uh Zapata, Georges Gauthier, Philippe Gondret, Damien Pham Van Bang Aug 2014

A Two Phase Numerical Model For The Water Injection Dredging (Wid) Technology: An Unified Formulation For Continuum Mechanic, Dan Nguyen, Miguel Uh Zapata, Georges Gauthier, Philippe Gondret, Damien Pham Van Bang

International Conference on Hydroinformatics

Accurate simulation of the sediment processes at the vicinity of the sediment bed/water interface is still very difficult because of the multi-physic character of the problem. A soil is a continuum and porous media having solid properties such as elasticity and plasticity. It is usually considered as impermeable boundary which could evolve through erosion and deposition fluxes. Flow characteristics close to the liquid-bed interface are poorly described by soil mechanics and rheological characteristics of the soil are usually neglected by fluid mechanics. In order to account for the flow domain as a whole, extending from the substratum up to the …


Parallel Design Patterns And Program Performance, Yu Zhao May 2014

Parallel Design Patterns And Program Performance, Yu Zhao

Mathematics, Statistics, and Computer Science Honors Projects

With the rapid advancement of parallel and distributed computing (PDC), three types of hardware and their corresponding software (hardware-software pairs) are becoming more and more popular: Distributed Memory Systems with the Message Passing Interface (MPI) library, Shared Memory Systems with the OpenMP library and Co-processor Systems with a general purpose parallel computing library. Alongside the development of both hardware and software aspects of PDC, the process of designing parallel programs has also improved significantly over the years. A consequence of this is that researchers have been able to describe many parallel design patterns, which are recurring solutions to well-known problems …


An Mpi-Enabled Mapreduce Framework For Molecular Dynamics Simulation Applications, Shuju Bai, Ebrahim Khosravi, Seung Jong Park Dec 2013

An Mpi-Enabled Mapreduce Framework For Molecular Dynamics Simulation Applications, Shuju Bai, Ebrahim Khosravi, Seung Jong Park

Computer Science Faculty Research & Creative Works

Computational technologies have been extensively investigated to be applied into many application domains. Since the presence of Hadoop, an implementation of MapReduce framework, scientists have applied it to biological sciences, chemistry, medical sciences, and other areas to efficiently process huge data sets. Although Hadoop is fault-tolerant and processes data in parallel, it does not support MPI in computing. The Map/Reduce tasks in Hadoop have to be serial, which results in inefficient scientific computations wrapped in Map/Reduce tasks. In the real world, many applications require MPI techniques due to their nature. Molecular dynamics simulation is one of them. In our research, …


The Distributed Application Debugger, Michael Quinn Jones May 2013

The Distributed Application Debugger, Michael Quinn Jones

UNLV Theses, Dissertations, Professional Papers, and Capstones

Developing parallel programs which run on distributed computer clusters introduces additional challenges to those present in traditional sequential programs. Debugging parallel programs requires not only inspecting the sequential code executing on each node but also tracking the flow of messages being passed between them in order to infer where the source of a bug actually lies.

This thesis focuses on a debugging too called The Distributed Application Debugger which targets a popular distributed C programming library called MPI (Message Passing Interface). The tool is composed of multiple components which run together seamlessly to provide its users an effective way to …


Using Mapreduce Streaming For Distributed Life Simulation On The Cloud, Atanas Radenski Jan 2013

Using Mapreduce Streaming For Distributed Life Simulation On The Cloud, Atanas Radenski

Mathematics, Physics, and Computer Science Faculty Books and Book Chapters

Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable …


A Hybrid Framework Of Iterative Mapreduce And Mpi For Molecular Dynamics Applications, Shuju Bai Jan 2013

A Hybrid Framework Of Iterative Mapreduce And Mpi For Molecular Dynamics Applications, Shuju Bai

LSU Doctoral Dissertations

Developing platforms for large scale data processing has been a great interest to scientists. Hadoop is a widely used computational platform which is a fault-tolerant distributed system for data storage due to HDFS (Hadoop Distributed File System) and performs fault-tolerant distributed data processing in parallel due to MapReduce framework. It is quite often that actual computations require multiple MapReduce cycles, which needs chained MapReduce jobs. However, Design by Hadoop is poor in addressing problems with iterative structures. In many iterative problems, some invariant data is required by every MapReduce cycle. The same data is uploaded to Hadoop file system in …


Object Oriented Implementation Of The Parallel Toolkit Library, Sandhya Vinnakota Dec 2012

Object Oriented Implementation Of The Parallel Toolkit Library, Sandhya Vinnakota

Computer Science Graduate Projects and Theses

With manufacturing efficiencies and technological innovation the computing power of commodity machines has been increasing accompanied by decreasing costs. With the very favorable price/performance ratio the computing community has shifted from monolithic machines to networked machines.

This has created the need for software to manage the parallelism of the network. One such work has been the Parallel Toolkit Library. The Parallel Toolkit Library provides support for common design functionalities used throughout parallel programs.

This work extends the PTK C library for C++ parallel programs. The motivation for the current project stems from the need to let parallel programs reap the …


A Flexible Consent Management System For Master Person Indices, Aditya Pakalapati Dec 2012

A Flexible Consent Management System For Master Person Indices, Aditya Pakalapati

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

In healthcare, a Master Person Index (MPI) is a system that integrates information of individual from multiple data sources. To ensure confidentiality, such systems, particularly in healthcare, need to respect individual and organizational constraints on the sharing of data. This report describes a reusable consent management system that enforces such constraints and how it has been tested in the context of the Utah Department of Health (UDOH) MPI for public health.


Hybrid Mpi+Upc Parallel Programming Paradigm On An Smp Cluster, Zeki̇ Bozkuş Jan 2012

Hybrid Mpi+Upc Parallel Programming Paradigm On An Smp Cluster, Zeki̇ Bozkuş

Turkish Journal of Electrical Engineering and Computer Sciences

The symmetric multiprocessing (SMP) cluster system, which consists of shared memory nodes with several multicore central processing units connected to a high-speed network to form a distributed memory system, is the most widely available hardware architecture for the high-performance computing community. Today, the Message Passing Interface (MPI) is the most widely used parallel programming paradigm for SMP clusters, in which the MPI provides programming both for an SMP node and among nodes simultaneously. However, Unified Parallel C (UPC) is an emerging alternative that supports the partitioned global address space model that can be again employed within and across the nodes …


Hyperspectral Data Processing In A High Performance Computing Environment: A Parallel Best Band Selection Algorithm, Stefan Robila, Gerald Busardo Dec 2011

Hyperspectral Data Processing In A High Performance Computing Environment: A Parallel Best Band Selection Algorithm, Stefan Robila, Gerald Busardo

Department of Computer Science Faculty Scholarship and Creative Works

Hyperspectral data are characterized by a richness of information unique among various visual representations of a scene by representing the information in a collection of grayscale images with each image corresponding to a narrow interval in the electromagnetic spectrum. Such detail allows for precise identification of materials in the scene and promises to support advances in imaging beyond the visible range. However, hyperspectral data are considerably large and cumbersome to process and efficient computing solutions based on high performance computing are needed. In this paper we first provide an overview of hyperspectral data and the current state of the art …


The Design And Evolution Of Zipcode, Anthony Skjellum, Steven G. Smith, Nathan E. Doss, Alvin Leung Dec 2011

The Design And Evolution Of Zipcode, Anthony Skjellum, Steven G. Smith, Nathan E. Doss, Alvin Leung

Steven D. Smith

Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and identified needs. Features of Zipcode that were originally unique to it, were its simultaneous support of static process groups, communication contexts, and virtual topologies, forming the "mailer" data structure. Point-to-point and collective operations reference the underlying group, and use contexts to avoid mixing up messages. Recently, we have added "gather-send" and "receive-scatter" semantics, based on persistent Zipcode "invoices," both …


Wide-Area Implementation Of The Message Passing Interface, Ian Foster, Jonathan Geisler, William Gropp, Nicholas Karonis, Ewing Lusk, George K. Thiruvathukal, Steven Tuecke Nov 2011

Wide-Area Implementation Of The Message Passing Interface, Ian Foster, Jonathan Geisler, William Gropp, Nicholas Karonis, Ewing Lusk, George K. Thiruvathukal, Steven Tuecke

George K. Thiruvathukal

The Message Passing Interface (MPI) can be used as a portable, high-performance programming model for wide-area computing systems. The wide-area environment introduces challenging problems for the MPI implementor, due to the heterogeneity of both the underlying physical infrastructure and the software environment at different sites. In this article, we describe an MPI implementation that incorporates solutions to these problems. This implementation has beenconstructed by extending the Argonne MPICH implementation of MPI to use communicationservices provided by the Nexus communication library and authentication, resource allocation, process creation/management, and information services provided by the I-Soft system (initially) and the Globus metacomputing toolkit …


Shared Memory, Message Passing, And Hybrid Merge Sorts For Standalone And Clustered Smps, Atanas Radenski Jan 2011

Shared Memory, Message Passing, And Hybrid Merge Sorts For Standalone And Clustered Smps, Atanas Radenski

Mathematics, Physics, and Computer Science Faculty Books and Book Chapters

While merge sort is well-understood in parallel algorithms theory, relatively little is known of how to implement parallel merge sort with mainstream parallel programming platforms, such as OpenMP and MPI, and run it on mainstream SMP-based systems, such as multi-core computers and multi-core clusters. This is misfortunate because merge sort is not only a fast and stable sort algorithm, but it is also an easy to understand and popular representative of the rich class of divide-and-conquer methods; hence better understanding of merge sort parallelization can contribute to better understanding of divide-and-conquer parallelization in general. In this paper, we investigate three …


Fast Marching Methods - Parallel Implementation And Analysis, Maria Cristina Tugurlan Jan 2008

Fast Marching Methods - Parallel Implementation And Analysis, Maria Cristina Tugurlan

LSU Doctoral Dissertations

Fast Marching represents a very efficient technique for solving front propagation problems, which can be formulated as partial differential equations with Dirichlet boundary conditions, called Eikonal equation: $F(x)|\nabla T(x)|=1$, for $x \in \Omega$ and $T(x)=0$ for $x \in \Gamma$, where $\Omega$ is a domain in $\mathbb{R}^n$, $\Gamma$ is the initial position of a curve evolving with normal velocity F>0. Fast Marching Methods are a necessary step in Level Set Methods, which are widely used today in scientific computing. The classical Fast Marching Methods, based on finite differences, are typically sequential. Parallelizing Fast Marching Methods is a step forward for …