Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Publication Year
- Publication
- Publication Type
Articles 1 - 26 of 26
Full-Text Articles in Engineering
From Inductive To Deductive Learning, Mikhail Mayers, Brian Henson
From Inductive To Deductive Learning, Mikhail Mayers, Brian Henson
Undergraduate Research & Mentoring Program
Using Machine Vision as a way to give information to Prolog. Using Prolog to solve deductive problems and analogical problems without having to manually enter all facts and information.
Resizable, Scalable, Concurrent Hash Tables Via Relativistic Programming, Josh Triplett, Paul E. Mckenney, Jonathan Walpole
Resizable, Scalable, Concurrent Hash Tables Via Relativistic Programming, Josh Triplett, Paul E. Mckenney, Jonathan Walpole
Computer Science Faculty Publications and Presentations
Presentation focusing on software synchronization, thread locking, transactional memory, and relativistic programming. Hash table algorithms are presented with examples of relativistic list insertion and removal, and related data structures. Existing approaches are compared to new methodologies and future work with relativistic data structures.
Efficient Support Of Consistent Cyclic Search With Read-Copy-Update And Parallel Updates, Jonathan Walpole, Paul E. Mckenney
Efficient Support Of Consistent Cyclic Search With Read-Copy-Update And Parallel Updates, Jonathan Walpole, Paul E. Mckenney
Computer Science Faculty Publications and Presentations
A method, system and computer program product for supporting concurrent updates to a shared data element group while preserving group integrity on behalf of one or more readers that are concurrently referencing group data elements without using locks or atomic instructions. Two or more updaters may be invoked to generate new group data elements. Each new data element created by the same up dater is assigned a new generation number that is different than a global generation number associated with the data element group and which allows a reader of the data element group to determine whether the new data …
Generalized Construction Of Scalable Concurrent Data Structures Via Relativistic Programming, Josh Triplett, Paul E. Mckenney, Philip W. Howard, Jonathan Walpole
Generalized Construction Of Scalable Concurrent Data Structures Via Relativistic Programming, Josh Triplett, Paul E. Mckenney, Philip W. Howard, Jonathan Walpole
Computer Science Faculty Publications and Presentations
We present relativistic programming, a concurrent programming model based on shared addressing, which supports efficient, scalable operation on either uniform shared-memory or distributed shared- memory systems. Relativistic programming provides a strong causal ordering property, allowing a series of read operations to appear as an atomic transaction that occurs entirely between two ordered write operations. This preserves the simple immutable-memory programming model available via mutual exclusion or transactional memory. Furthermore, relativistic programming provides joint-access parallelism, allowing readers to run concurrently with a writer on the same data. We demonstrate a generalized construction technique for concurrent data structures based on relativistic programming, …
The Ordering Requirements Of Relativistic And Reader-Writer Locking Approaches To Shared Data Access, Philip William Howard, Josh Triplett, Jonathan Walpole, Paul E. Mckenney
The Ordering Requirements Of Relativistic And Reader-Writer Locking Approaches To Shared Data Access, Philip William Howard, Josh Triplett, Jonathan Walpole, Paul E. Mckenney
Computer Science Faculty Publications and Presentations
The semantics of reader-writer locks allow read-side concurrency. Unfortunately, the locking primitives serialize access to the lock variable to an extent that little or no concurrency is realized in practice for small critical sections. Relativistic programming is a methodology that also allows read- side concurrency. Relativistic programming uses dfferent ordering constraints than reader-writer locking. The different ordering constraints allow relativistic readers to proceed without synchronization so relativistic readers scale even for very short critical sections. In this paper we explore the diferences between the ordering constraints for reader-writer locking and relativistic programs. We show how and why the dfferent ordering …
Scalable Load And Store Processing In Latency Tolerant Processors, Amit Vasant Gandhi
Scalable Load And Store Processing In Latency Tolerant Processors, Amit Vasant Gandhi
Dissertations and Theses
Memory latency-tolerant architectures support thousands of in-flight instructions without proportionate scaling of cycle-critical processor resources, and thousands of useful instructions can complete in parallel with a long-latency miss to memory. These architectures, however, require large queues to track all loads and stores executed while a long-latency miss is pending. Hierarchical designs alleviate cycle-time impact of these structures but the Content-Addressable-Memory (CAM) and search functions required to enforce memory ordering and provide data-forwarding place high demand on area and power.
Many recent proposals address the complexity of load and store queues. However, none of these proposals addresses the fundamental source of …
The Quest For Novel Computational Paradigms And Machines, Christof Teuscher
The Quest For Novel Computational Paradigms And Machines, Christof Teuscher
Electrical and Computer Engineering Faculty Publications and Presentations
The quest for novel and unconventional computing machines is mainly motivated by the man-machine dichotomy and by the belief that meeting tomorrow's complex real world challenges will require new paradigms and new engineering methods to organize, train, and program such machines and to interact with them.
Infrastructure For Performance Tuning Mpi Applications, Kathryn Marie Mohror
Infrastructure For Performance Tuning Mpi Applications, Kathryn Marie Mohror
Dissertations and Theses
Clusters of workstations are becoming increasingly popular as a low-budget alternative for supercomputing power. In these systems,message-passing is often used to allow the separate nodes to act as a single computing machine. Programmers of such systems face a daunting challenge in understanding the performance bottlenecks of their applications. This is largely due to the vast amount of performance data that is collected, and the time and expertise necessary to use traditional parallel performance tools to analyze that data.
The goal of this project is to increase the level of performance tool support for message-passing application programmers on clusters of workstations. …
Pperfgrid: A Grid Services-Based Tool For The Exchange Of Heterogeneous Parallel Performance Data, John Jared Hoffman
Pperfgrid: A Grid Services-Based Tool For The Exchange Of Heterogeneous Parallel Performance Data, John Jared Hoffman
Dissertations and Theses
This thesis details the approach taken in developing PPerfGrid. Section 2 discusses other research related to this project. Section 3 provides general background on the technologies utilized in PPerfGrid, focusing on the components that make up the Grid services architecture. Section 4 provides a description of the architecture of PPerfGrid. Section 5 details the implementation of PPerfGrid. Section 6 presents tests designed to measure the overhead and scalability of the PPerfGrid application. Section 7 suggests future work, and Section 8 concludes the thesis.
A Performance Study Of Lam And Mpich On An Smp Cluster, Brian Patrick Kearns
A Performance Study Of Lam And Mpich On An Smp Cluster, Brian Patrick Kearns
Dissertations and Theses
Many universities and research laboratories have developed low cost clusters, built from Commodity-Off-The-Shelf (COTS) components and running mostly free software. Research has shown that these types of systems are well-equipped to handle many problems requiring parallel processing. The primary components of clusters are hardware, networking, and system software. An important system software consideration for clusters is the choice of the message passing library.
MPI (Message Passing Interface) has arguably become the most widely used message passing library on clusters and other parallel architectures, due in part to its existence as a standard. As a standard, MPI is open for anyone …
Querying Geographically Dispersed, Heterogeneous Data Stores: The Pperfxchange Approach, Matthew Edward Colgrove
Querying Geographically Dispersed, Heterogeneous Data Stores: The Pperfxchange Approach, Matthew Edward Colgrove
Dissertations and Theses
This thesis details PPerfXchange’s approach for querying geographically dispersed heterogeneous data stores. While elements of PPerfXchange’s method have been implemented for other application areas, PPerfXchange shows how these elements can be applied to parallel performance analysis. The accomplishments of this thesis are:
- The design of an architecture for PPerfXchange, giving a uniform method to query heterogeneous data stores;
- A proof of concept prototype implementation of PPerfXchange including a partial implementation of an XQuery processor and a relational database virtual XML document; and
- Evaluation of PPerfXchange using example parallel performance analysis data.
Towards Comparative Profiling Of Parallel Applications With Pperfdb, Christian Leland Hansen
Towards Comparative Profiling Of Parallel Applications With Pperfdb, Christian Leland Hansen
Dissertations and Theses
Due to the complex nature of parallel programming, it is difficult to diagnose and solve performance related problems. Knowledge of program behavior is obtained experimentally, with repeated runs of a slightly modified version of the application or the same code in different environments. In these circumstances, comparative performance analysis can provide meaningful insights into the subtle effects of system and code changes on parallel program behavior by highlighting the difference in performance results across executions.
I have designed and implemented modules which extend the PPerfDB performance tool to allow access to existing performance data generated by several commonly used tracing …
Static Compaction Of Test Sequences For Synchronous Sequential Circuits, Lijie Qi
Static Compaction Of Test Sequences For Synchronous Sequential Circuits, Lijie Qi
Dissertations and Theses
Today, VLSI design has progressed to a stage where it needs to incorporate methods of testing circuits. The Automatic Test Pattern Generation (ATPG) is a very attractive method and feasible on almost any combinational and sequential circuit.
Currently available automatic test pattern generators (ATPGs) generate test sets that may be excessively long. Because a cost of testing depends on the test length. compaction techniques have been used to reduce that length. The motivation for studying test compaction is twofold. Firstly, by reducing the test sequence length. the memory requirements during the test application and the test application time are reduced. …
Scheduling Of Parallel Jobs On Dynamic, Heterogenous Networks, Dan Clark, Jeremy Casas, Steve Otto, Robert Prouty, Jonathan Walpole
Scheduling Of Parallel Jobs On Dynamic, Heterogenous Networks, Dan Clark, Jeremy Casas, Steve Otto, Robert Prouty, Jonathan Walpole
Computer Science Faculty Publications and Presentations
In using a shared network of workstations for parallel processing, it is not only important to consider heterogeneity and differences in processing power between the workstations but also the dynamics of the system as a whole. In such a computing environment where the use of resources vary as other applications consume and release resources, intelligent scheduling of the parallel jobs onto the available resources is essential to maximize resource utilization. Despite this realization, however, there are few systems available that provide an infrastructure for the easy development and testing of these intelligent schedulers. In this paper, an infrastructure is presented …
Evolution Of Emergent Computation, James P. Crutchfield, Melanie Mitchell
Evolution Of Emergent Computation, James P. Crutchfield, Melanie Mitchell
Computer Science Faculty Publications and Presentations
A simple evolutionary process can discover sophisticated methods for emergent information processing in decentralized spatially-extended systems. The mechanisms underlying the resulting emergent computation are explicated by a novel technique for analyzing particle-based logic embedded in pattern-forming systems. Understanding how globally-coordinated computation can emerge in evolution is relevant both for the scientific understanding of natural information processing and for engineering new forms of parallel computing systems.
Methodology For Accurate Speedup Prediction, Aruna Chittor
Methodology For Accurate Speedup Prediction, Aruna Chittor
Dissertations and Theses
The effective use of computational resources requires a good understanding of parallel architectures and algorithms. The effect of the parallel architecture and also the parallel application on the performance of the parallel systems becomes more complex with increasing numbers of processors. We will address this issue in this thesis, and develop a methodology to predict the overall execution time of a parallel application as a function of the system and problem size by combining simple analysis with a few experimental results. We show that runtimes and speedup can be predicted more accurately by analyzing the functional forms of the sequential …
Multiplexed Pipelining : A Cost Effective Loop Transformation Technique, Satish Pai
Multiplexed Pipelining : A Cost Effective Loop Transformation Technique, Satish Pai
Dissertations and Theses
Parallel processing has gained increasing importance over the last few years. A key aim of parallel processing is to improve the execution times of scientific programs by mapping them to many processors. Loops form an important part of most computational programs and must be processed efficiently to get superior performance in terms of execution times. Important examples of such programs include graphics algorithms, matrix operations (which are used in signal processing and image processing applications), particle simulation, and other scientific applications. Pipelining uses overlapped parallelism to efficiently reduce execution time.
Associative Processing Implemented With Content-Addressable Memories, Luis Sergio Kida
Associative Processing Implemented With Content-Addressable Memories, Luis Sergio Kida
Dissertations and Theses
The associative processing model provides an alternative solution to the von Neumann bottleneck. The memory of an associative computer takes some of the responsibility for processing. Only intermediate results are exchanged between memory and processor. This greatly reduces the amount of communication between them. Content-addressable memories are one implementation of memory for this computational model. Associative computers implemented with CAMs have reported performance improvements of three orders of magnitude, which is equivalent to the performance of the same application running in a conventional computer with clock frequencies of the order of GHz. Among the benefits of content-addressable memories to the …
Exploiting And/Or Parallelism In Prolog, Bankim Shah
Exploiting And/Or Parallelism In Prolog, Bankim Shah
Dissertations and Theses
Logic programming languages have generated increasing interest over the last few years. Logic programming languages like Prolog are being explored for different applications. Prolog is inherently parallel. Attempts are being made to utilize this inherent parallelism. There are two kinds of parallelism present in Prolog, OR parallelism and AND parallelism. OR parallelism is relatively easy to exploit while AND parallelism poses interesting issues. One of the main issues is dependencies between literals.
It is very important to use the AND parallelism available in the language structure as not exploiting it would result in a substantial loss of parallelism. Any system …
A Practical Parallel Algorithm For The Minimization Of KröNecker Reed-Muller Expansions, Paul John Gilliam
A Practical Parallel Algorithm For The Minimization Of KröNecker Reed-Muller Expansions, Paul John Gilliam
Dissertations and Theses
A number of recent developments has increased the desirability of using exclusive OR (XOR) gates in the synthesis of switching functions. This has, in turn, led naturally to an increased interest in algorithms for the minimization of Exclusive-Or Sum of Products (ESOP) forms. Although this is an active area of research, it is not nearly as developed as the traditional Sum of Products forms. Computer programs to find minimum ESOPs are not readily available and those that do exist are impractical to use as investigative tools because they are too slow and/or require too much memory. A practical tool would …
Threaded Octree Structures For Fast Neighbor Voxel Processing In A Parallel Ray Tracer, B.R. Naveen Chandra
Threaded Octree Structures For Fast Neighbor Voxel Processing In A Parallel Ray Tracer, B.R. Naveen Chandra
Dissertations and Theses
In the field of Computer Graphics, Ray Tracing has so far been the the best algorithm for rendering of realistic three dimensional images created by mathematical models. Ray Tracing is also known for its very large computation times, where the computation depends on the picture resolution, the number of objects and the complexity of the scene.
Parallel Architectures For Solving Combinatorial Problems Of Logic Design, Phuong Minh Ho
Parallel Architectures For Solving Combinatorial Problems Of Logic Design, Phuong Minh Ho
Dissertations and Theses
This thesis presents a new, practical approach to solve various NP-hard combinatorial problems of logic synthesis, logic programming, graph theory and related areas. A problem to be solved is polynomially time reduced to one of several generic combinatorial problems which can be expressed in the form of the Generalized Propositional Formula (GPF) : a Boolean product of clauses, where each clause is a sum of products of negated or non-negated literals.
Implementing Ray Tracing Algorithm In Parallel Environment, Tjah Jadi
Implementing Ray Tracing Algorithm In Parallel Environment, Tjah Jadi
Dissertations and Theses
Ray tracing is a very popular rendering algorithm in the field of computer graphics because it can generate highly-realistic images from three-dimensional models. Unfortunately, the computational cost is very expensive. To speed up the rendering process we present both static and dynamic scheduling (balancing) strategies for a multiprocessor system. Hence, the load balancing among the processors is the most important problem in parallel processing. The implementation of the algorithm is based on a modified octree structure.
A New General Purpose Systolic Array For Matrix Computations, Hai Van Dinh Le
A New General Purpose Systolic Array For Matrix Computations, Hai Van Dinh Le
Dissertations and Theses
In this thesis, we propose a new systolic architecture which is based on the Faddeev's algorithm. Because Faddeev's algorithm is inherently general purpose, our architecture is able to perform a wide class of matrix computations. And since the architecture is systolic based, it brings massive parallelism to all of its computations. As a result, many matrix operations including addition, multiplication, inversion, LU-decomposition, transpose, and solutions to linear systems of equations can now be performed extremely fast. In addition, our design introduces several concepts which are new to systolic architectures:
- It can be re-configured during run time to perform different …
Quadtree-Based Processing Of Digital Images, Ramin Naderi
Quadtree-Based Processing Of Digital Images, Ramin Naderi
Dissertations and Theses
Image representation plays an important role in image processing applications, which usually. contain a huge amount of data. An image is a two-dimensional array of points, and each point contains information (eg: color). A 1024 by 1024 pixel image occupies 1 mega byte of space in the main memory. In actual circumstances 2 to 3 mega bytes of space are needed to facilitate the various image processing tasks. Large amounts of secondary memory are also required to hold various data sets.
In this thesis, two different operations on the quadtree are presented.
There are, in general, two types of data …
Two New Parallel Processors For Real Time Classification Of 3-D Moving Objects And Quad Tree Generation, Farjam Majd
Two New Parallel Processors For Real Time Classification Of 3-D Moving Objects And Quad Tree Generation, Farjam Majd
Dissertations and Theses
Two related image processing problems are addressed in this thesis. First, the problem of identification of 3-D objects in real time is explored. An algorithm to solve this problem and a hardware system for parallel implementation of this algorithm are proposed. The classification scheme is based on the "Invariant Numerical Shape Modeling" (INSM) algorithm originally developed for 2-D pattern recognition such as alphanumeric characters. This algorithm is then extended to 3-D and is used for general 3-D object identification. The hardware system is an SIMD parallel processor, designed in bit slice fashion for expandability. It consists of a library of …