Engineering | Open Access Articles | Digital Commons Network™

Scalable Load And Store Processing In Latency Tolerant Processors, Amit Vasant Gandhi Oct 2006

Scalable Load And Store Processing In Latency Tolerant Processors, Amit Vasant Gandhi

Dissertations and Theses

Memory latency-tolerant architectures support thousands of in-flight instructions without proportionate scaling of cycle-critical processor resources, and thousands of useful instructions can complete in parallel with a long-latency miss to memory. These architectures, however, require large queues to track all loads and stores executed while a long-latency miss is pending. Hierarchical designs alleviate cycle-time impact of these structures but the Content-Addressable-Memory (CAM) and search functions required to enforce memory ordering and provide data-forwarding place high demand on area and power.

Many recent proposals address the complexity of load and store queues. However, none of these proposals addresses the fundamental source of …

Go to article

Static Compaction Of Test Sequences For Synchronous Sequential Circuits, Lijie Qi Jan 1998

Static Compaction Of Test Sequences For Synchronous Sequential Circuits, Lijie Qi

Dissertations and Theses

Today, VLSI design has progressed to a stage where it needs to incorporate methods of testing circuits. The Automatic Test Pattern Generation (ATPG) is a very attractive method and feasible on almost any combinational and sequential circuit.

Currently available automatic test pattern generators (ATPGs) generate test sets that may be excessively long. Because a cost of testing depends on the test length. compaction techniques have been used to reduce that length. The motivation for studying test compaction is twofold. Firstly, by reducing the test sequence length. the memory requirements during the test application and the test application time are reduced. …

Go to article

Methodology For Accurate Speedup Prediction, Aruna Chittor Dec 1994

Methodology For Accurate Speedup Prediction, Aruna Chittor

Dissertations and Theses

The effective use of computational resources requires a good understanding of parallel architectures and algorithms. The effect of the parallel architecture and also the parallel application on the performance of the parallel systems becomes more complex with increasing numbers of processors. We will address this issue in this thesis, and develop a methodology to predict the overall execution time of a parallel application as a function of the system and problem size by combining simple analysis with a few experimental results. We show that runtimes and speedup can be predicted more accurately by analyzing the functional forms of the sequential …

Go to article

Multiplexed Pipelining : A Cost Effective Loop Transformation Technique, Satish Pai Jan 1992

Multiplexed Pipelining : A Cost Effective Loop Transformation Technique, Satish Pai

Dissertations and Theses

Parallel processing has gained increasing importance over the last few years. A key aim of parallel processing is to improve the execution times of scientific programs by mapping them to many processors. Loops form an important part of most computational programs and must be processed efficiently to get superior performance in terms of execution times. Important examples of such programs include graphics algorithms, matrix operations (which are used in signal processing and image processing applications), particle simulation, and other scientific applications. Pipelining uses overlapped parallelism to efficiently reduce execution time.

Go to article

Associative Processing Implemented With Content-Addressable Memories, Luis Sergio Kida Jul 1991

Associative Processing Implemented With Content-Addressable Memories, Luis Sergio Kida

Dissertations and Theses

The associative processing model provides an alternative solution to the von Neumann bottleneck. The memory of an associative computer takes some of the responsibility for processing. Only intermediate results are exchanged between memory and processor. This greatly reduces the amount of communication between them. Content-addressable memories are one implementation of memory for this computational model. Associative computers implemented with CAMs have reported performance improvements of three orders of magnitude, which is equivalent to the performance of the same application running in a conventional computer with clock frequencies of the order of GHz. Among the benefits of content-addressable memories to the …

Go to article

Exploiting And/Or Parallelism In Prolog, Bankim Shah May 1991

Exploiting And/Or Parallelism In Prolog, Bankim Shah

Dissertations and Theses

Logic programming languages have generated increasing interest over the last few years. Logic programming languages like Prolog are being explored for different applications. Prolog is inherently parallel. Attempts are being made to utilize this inherent parallelism. There are two kinds of parallelism present in Prolog, OR parallelism and AND parallelism. OR parallelism is relatively easy to exploit while AND parallelism poses interesting issues. One of the main issues is dependencies between literals.

It is very important to use the AND parallelism available in the language structure as not exploiting it would result in a substantial loss of parallelism. Any system …

Go to article

A Practical Parallel Algorithm For The Minimization Of KröNecker Reed-Muller Expansions, Paul John Gilliam Jan 1991

A Practical Parallel Algorithm For The Minimization Of KröNecker Reed-Muller Expansions, Paul John Gilliam

Dissertations and Theses

A number of recent developments has increased the desirability of using exclusive OR (XOR) gates in the synthesis of switching functions. This has, in turn, led naturally to an increased interest in algorithms for the minimization of Exclusive-Or Sum of Products (ESOP) forms. Although this is an active area of research, it is not nearly as developed as the traditional Sum of Products forms. Computer programs to find minimum ESOPs are not readily available and those that do exist are impractical to use as investigative tools because they are too slow and/or require too much memory. A practical tool would …

Go to article

Threaded Octree Structures For Fast Neighbor Voxel Processing In A Parallel Ray Tracer, B.R. Naveen Chandra Jan 1990

Threaded Octree Structures For Fast Neighbor Voxel Processing In A Parallel Ray Tracer, B.R. Naveen Chandra

Dissertations and Theses

In the field of Computer Graphics, Ray Tracing has so far been the the best algorithm for rendering of realistic three dimensional images created by mathematical models. Ray Tracing is also known for its very large computation times, where the computation depends on the picture resolution, the number of objects and the complexity of the scene.

Go to article

Parallel Architectures For Solving Combinatorial Problems Of Logic Design, Phuong Minh Ho Jan 1989

Parallel Architectures For Solving Combinatorial Problems Of Logic Design, Phuong Minh Ho

Dissertations and Theses

This thesis presents a new, practical approach to solve various NP-hard combinatorial problems of logic synthesis, logic programming, graph theory and related areas. A problem to be solved is polynomially time reduced to one of several generic combinatorial problems which can be expressed in the form of the Generalized Propositional Formula (GPF) : a Boolean product of clauses, where each clause is a sum of products of negated or non-negated literals.

Go to article

Implementing Ray Tracing Algorithm In Parallel Environment, Tjah Jadi Dec 1988

Implementing Ray Tracing Algorithm In Parallel Environment, Tjah Jadi

Dissertations and Theses

Ray tracing is a very popular rendering algorithm in the field of computer graphics because it can generate highly-realistic images from three-dimensional models. Unfortunately, the computational cost is very expensive. To speed up the rendering process we present both static and dynamic scheduling (balancing) strategies for a multiprocessor system. Hence, the load balancing among the processors is the most important problem in parallel processing. The implementation of the algorithm is based on a modified octree structure.

Go to article

A New General Purpose Systolic Array For Matrix Computations, Hai Van Dinh Le Jan 1988

A New General Purpose Systolic Array For Matrix Computations, Hai Van Dinh Le

Dissertations and Theses

In this thesis, we propose a new systolic architecture which is based on the Faddeev's algorithm. Because Faddeev's algorithm is inherently general purpose, our architecture is able to perform a wide class of matrix computations. And since the architecture is systolic based, it brings massive parallelism to all of its computations. As a result, many matrix operations including addition, multiplication, inversion, LU-decomposition, transpose, and solutions to linear systems of equations can now be performed extremely fast. In addition, our design introduces several concepts which are new to systolic architectures:

- It can be re-configured during run time to perform different …

Go to article

Quadtree-Based Processing Of Digital Images, Ramin Naderi Jan 1986

Quadtree-Based Processing Of Digital Images, Ramin Naderi

Dissertations and Theses

Image representation plays an important role in image processing applications, which usually. contain a huge amount of data. An image is a two-dimensional array of points, and each point contains information (eg: color). A 1024 by 1024 pixel image occupies 1 mega byte of space in the main memory. In actual circumstances 2 to 3 mega bytes of space are needed to facilitate the various image processing tasks. Large amounts of secondary memory are also required to hold various data sets.

In this thesis, two different operations on the quadtree are presented.

There are, in general, two types of data …

Go to article

Two New Parallel Processors For Real Time Classification Of 3-D Moving Objects And Quad Tree Generation, Farjam Majd Jan 1985

Two New Parallel Processors For Real Time Classification Of 3-D Moving Objects And Quad Tree Generation, Farjam Majd

Dissertations and Theses

Two related image processing problems are addressed in this thesis. First, the problem of identification of 3-D objects in real time is explored. An algorithm to solve this problem and a hardware system for parallel implementation of this algorithm are proposed. The classification scheme is based on the "Invariant Numerical Shape Modeling" (INSM) algorithm originally developed for 2-D pattern recognition such as alphanumeric characters. This algorithm is then extended to 3-D and is used for general 3-D object identification. The hardware system is an SIMD parallel processor, designed in bit slice fashion for expandability. It consists of a library of …

Go to article

Engineering Commons^™

Full-Text Articles in Engineering

Scalable Load And Store Processing In Latency Tolerant Processors, Amit Vasant Gandhi

Dissertations and Theses

Static Compaction Of Test Sequences For Synchronous Sequential Circuits, Lijie Qi

Dissertations and Theses

Methodology For Accurate Speedup Prediction, Aruna Chittor

Dissertations and Theses

Multiplexed Pipelining : A Cost Effective Loop Transformation Technique, Satish Pai

Dissertations and Theses

Associative Processing Implemented With Content-Addressable Memories, Luis Sergio Kida

Dissertations and Theses

Exploiting And/Or Parallelism In Prolog, Bankim Shah

Dissertations and Theses

A Practical Parallel Algorithm For The Minimization Of KröNecker Reed-Muller Expansions, Paul John Gilliam

Dissertations and Theses

Threaded Octree Structures For Fast Neighbor Voxel Processing In A Parallel Ray Tracer, B.R. Naveen Chandra

Dissertations and Theses

Parallel Architectures For Solving Combinatorial Problems Of Logic Design, Phuong Minh Ho

Dissertations and Theses

Implementing Ray Tracing Algorithm In Parallel Environment, Tjah Jadi

Dissertations and Theses

A New General Purpose Systolic Array For Matrix Computations, Hai Van Dinh Le

Dissertations and Theses

Quadtree-Based Processing Of Digital Images, Ramin Naderi

Dissertations and Theses

Two New Parallel Processors For Real Time Classification Of 3-D Moving Objects And Quad Tree Generation, Farjam Majd

Dissertations and Theses