Open Access. Powered by Scholars. Published by Universities.^{®}

- Discipline

Articles **1** - **16** of ** 16**

## Full-Text Articles in Engineering

Infrastructure For Performance Tuning Mpi Applications, Kathryn Marie Mohror

#### Infrastructure For Performance Tuning Mpi Applications, Kathryn Marie Mohror

*Dissertations and Theses*

Clusters of workstations are becoming increasingly popular as a low-budget alternative for supercomputing power. In these systems,message-passing is often used to allow the separate nodes to act as a single computing machine. Programmers of such systems face a daunting challenge in understanding the performance bottlenecks of their applications. This is largely due to the vast amount of performance data that is collected, and the time and expertise necessary to use traditional parallel performance tools to analyze that data.

The goal of this project is to increase the level of performance tool support for message-passing application programmers on clusters of ...

Pperfgrid: A Grid Services-Based Tool For The Exchange Of Heterogeneous Parallel Performance Data, John Jared Hoffman

#### Pperfgrid: A Grid Services-Based Tool For The Exchange Of Heterogeneous Parallel Performance Data, John Jared Hoffman

*Dissertations and Theses*

This thesis details the approach taken in developing PPerfGrid. Section 2 discusses other research related to this project. Section 3 provides general background on the technologies utilized in PPerfGrid, focusing on the components that make up the Grid services architecture. Section 4 provides a description of the architecture of PPerfGrid. Section 5 details the implementation of PPerfGrid. Section 6 presents tests designed to measure the overhead and scalability of the PPerfGrid application. Section 7 suggests future work, and Section 8 concludes the thesis.

A Performance Study Of Lam And Mpich On An Smp Cluster, Brian Patrick Kearns

#### A Performance Study Of Lam And Mpich On An Smp Cluster, Brian Patrick Kearns

*Dissertations and Theses*

Many universities and research laboratories have developed low cost clusters, built from Commodity-Off-The-Shelf (COTS) components and running mostly free software. Research has shown that these types of systems are well-equipped to handle many problems requiring parallel processing. The primary components of clusters are hardware, networking, and system software. An important system software consideration for clusters is the choice of the message passing library.

MPI (Message Passing Interface) has arguably become the most widely used message passing library on clusters and other parallel architectures, due in part to its existence as a standard. As a standard, MPI is open for anyone ...

Querying Geographically Dispersed, Heterogeneous Data Stores: The Pperfxchange Approach, Matthew Edward Colgrove

#### Querying Geographically Dispersed, Heterogeneous Data Stores: The Pperfxchange Approach, Matthew Edward Colgrove

*Dissertations and Theses*

This thesis details PPerfXchange’s approach for querying geographically dispersed heterogeneous data stores. While elements of PPerfXchange’s method have been implemented for other application areas, PPerfXchange shows how these elements can be applied to parallel performance analysis. The accomplishments of this thesis are:

- The design of an architecture for PPerfXchange, giving a uniform method to query heterogeneous data stores;
- A proof of concept prototype implementation of PPerfXchange including a partial implementation of an XQuery processor and a relational database virtual XML document; and
- Evaluation of PPerfXchange using example parallel performance analysis data.

Towards Comparative Profiling Of Parallel Applications With Pperfdb, Christian Leland Hansen

#### Towards Comparative Profiling Of Parallel Applications With Pperfdb, Christian Leland Hansen

*Dissertations and Theses*

Due to the complex nature of parallel programming, it is difficult to diagnose and solve performance related problems. Knowledge of program behavior is obtained experimentally, with repeated runs of a slightly modified version of the application or the same code in different environments. In these circumstances, comparative performance analysis can provide meaningful insights into the subtle effects of system and code changes on parallel program behavior by highlighting the difference in performance results across executions.

I have designed and implemented modules which extend the PPerfDB performance tool to allow access to existing performance data generated by several commonly used tracing ...

Methodology For Accurate Speedup Prediction, Aruna Chittor

#### Methodology For Accurate Speedup Prediction, Aruna Chittor

*Dissertations and Theses*

The effective use of computational resources requires a good understanding of parallel architectures and algorithms. The effect of the parallel architecture and also the parallel application on the performance of the parallel systems becomes more complex with increasing numbers of processors. We will address this issue in this thesis, and develop a methodology to predict the overall execution time of a parallel application as a function of the system and problem size by combining simple analysis with a few experimental results. We show that runtimes and speedup can be predicted more accurately by analyzing the functional forms of the sequential ...

Multiplexed Pipelining : A Cost Effective Loop Transformation Technique, Satish Pai

#### Multiplexed Pipelining : A Cost Effective Loop Transformation Technique, Satish Pai

*Dissertations and Theses*

Parallel processing has gained increasing importance over the last few years. A key aim of parallel processing is to improve the execution times of scientific programs by mapping them to many processors. Loops form an important part of most computational programs and must be processed efficiently to get superior performance in terms of execution times. Important examples of such programs include graphics algorithms, matrix operations (which are used in signal processing and image processing applications), particle simulation, and other scientific applications. Pipelining uses overlapped parallelism to efficiently reduce execution time.

Exploiting And/Or Parallelism In Prolog, Bankim Shah

#### Exploiting And/Or Parallelism In Prolog, Bankim Shah

*Dissertations and Theses*

Logic programming languages have generated increasing interest over the last few years. Logic programming languages like Prolog are being explored for different applications. Prolog is inherently parallel. Attempts are being made to utilize this inherent parallelism. There are two kinds of parallelism present in Prolog, OR parallelism and AND parallelism. OR parallelism is relatively easy to exploit while AND parallelism poses interesting issues. One of the main issues is dependencies between literals.

It is very important to use the AND parallelism available in the language structure as not exploiting it would result in a substantial loss of parallelism. Any system ...

Associative Processing Implemented With Content-Addressable Memories, Luis Sergio Kida

#### Associative Processing Implemented With Content-Addressable Memories, Luis Sergio Kida

*Dissertations and Theses*

The associative processing model provides an alternative solution to the von Neumann bottleneck. The memory of an associative computer takes some of the responsibility for processing. Only intermediate results are exchanged between memory and processor. This greatly reduces the amount of communication between them. Content-addressable memories are one implementation of memory for this computational model. Associative computers implemented with CAMs have reported performance improvements of three orders of magnitude, which is equivalent to the performance of the same application running in a conventional computer with clock frequencies of the order of GHz. Among the benefits of content-addressable memories to the ...

A Practical Parallel Algorithm For The Minimization Of KröNecker Reed-Muller Expansions, Paul John Gilliam

#### A Practical Parallel Algorithm For The Minimization Of KröNecker Reed-Muller Expansions, Paul John Gilliam

*Dissertations and Theses*

A number of recent developments has increased the desirability of using exclusive OR (XOR) gates in the synthesis of switching functions. This has, in turn, led naturally to an increased interest in algorithms for the minimization of Exclusive-Or Sum of Products (ESOP) forms. Although this is an active area of research, it is not nearly as developed as the traditional Sum of Products forms. Computer programs to find minimum ESOPs are not readily available and those that do exist are impractical to use as investigative tools because they are too slow and/or require too much memory. A practical tool ...

Threaded Octree Structures For Fast Neighbor Voxel Processing In A Parallel Ray Tracer, B.R. Naveen Chandra

#### Threaded Octree Structures For Fast Neighbor Voxel Processing In A Parallel Ray Tracer, B.R. Naveen Chandra

*Dissertations and Theses*

In the field of Computer Graphics, Ray Tracing has so far been the the best algorithm for rendering of realistic three dimensional images created by mathematical models. Ray Tracing is also known for its very large computation times, where the computation depends on the picture resolution, the number of objects and the complexity of the scene.

Parallel Architectures For Solving Combinatorial Problems Of Logic Design, Phuong Minh Ho

#### Parallel Architectures For Solving Combinatorial Problems Of Logic Design, Phuong Minh Ho

*Dissertations and Theses*

This thesis presents a new, practical approach to solve various NP-hard combinatorial problems of logic synthesis, logic programming, graph theory and related areas. A problem to be solved is polynomially time reduced to one of several generic combinatorial problems which can be expressed in the form of the Generalized Propositional Formula (GPF) : a Boolean product of clauses, where each clause is a sum of products of negated or non-negated literals.

Implementing Ray Tracing Algorithm In Parallel Environment, Tjah Jadi

#### Implementing Ray Tracing Algorithm In Parallel Environment, Tjah Jadi

*Dissertations and Theses*

Ray tracing is a very popular rendering algorithm in the field of computer graphics because it can generate highly-realistic images from three-dimensional models. Unfortunately, the computational cost is very expensive. To speed up the rendering process we present both static and dynamic scheduling (balancing) strategies for a multiprocessor system. Hence, the load balancing among the processors is the most important problem in parallel processing. The implementation of the algorithm is based on a modified octree structure.

A New General Purpose Systolic Array For Matrix Computations, Hai Van Dinh Le

#### A New General Purpose Systolic Array For Matrix Computations, Hai Van Dinh Le

*Dissertations and Theses*

In this thesis, we propose a new systolic architecture which is based on the Faddeev's algorithm. Because Faddeev's algorithm is inherently general purpose, our architecture is able to perform a wide class of matrix computations. And since the architecture is systolic based, it brings massive parallelism to all of its computations. As a result, many matrix operations including addition, multiplication, inversion, LU-decomposition, transpose, and solutions to linear systems of equations can now be performed extremely fast. In addition, our design introduces several concepts which are new to systolic architectures:

- It can be re-configured during run time to perform ...

Quadtree-Based Processing Of Digital Images, Ramin Naderi

#### Quadtree-Based Processing Of Digital Images, Ramin Naderi

*Dissertations and Theses*

Image representation plays an important role in image processing applications, which usually. contain a huge amount of data. An image is a two-dimensional array of points, and each point contains information (eg: color). A 1024 by 1024 pixel image occupies 1 mega byte of space in the main memory. In actual circumstances 2 to 3 mega bytes of space are needed to facilitate the various image processing tasks. Large amounts of secondary memory are also required to hold various data sets.

In this thesis, two different operations on the quadtree are presented.

There are, in general, two types of data ...

Two New Parallel Processors For Real Time Classification Of 3-D Moving Objects And Quad Tree Generation, Farjam Majd

#### Two New Parallel Processors For Real Time Classification Of 3-D Moving Objects And Quad Tree Generation, Farjam Majd

*Dissertations and Theses*

Two related image processing problems are addressed in this thesis. First, the problem of identification of 3-D objects in real time is explored. An algorithm to solve this problem and a hardware system for parallel implementation of this algorithm are proposed. The classification scheme is based on the "Invariant Numerical Shape Modeling" (INSM) algorithm originally developed for 2-D pattern recognition such as alphanumeric characters. This algorithm is then extended to 3-D and is used for general 3-D object identification. The hardware system is an SIMD parallel processor, designed in bit slice fashion for expandability. It consists of a library of ...