Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 3 of 3

Full-Text Articles in Computer Engineering

An Overlay Architecture For Pattern Matching, Rasha Elham Karakchi Apr 2020

An Overlay Architecture For Pattern Matching, Rasha Elham Karakchi

Theses and Dissertations

Deterministic and Non-deterministic Finite Automata (DFA and NFA) comprise the fundamental unit of work for many emerging big data applications, motivating recent efforts to develop Domain-Specific Architectures (DSAs) to exploit fine-grain parallelism available in automata workloads.

This dissertation presents NAPOLY (Non-Deterministic Automata Processor Over- LaY), an overlay architecture and associated software that attempt to maximally exploit on-chip memory parallelism for NFA evaluation. In order to avoid an upper bound in NFA size that commonly affects prior efforts, NAPOLY is optimized for runtime reconfiguration, allowing for full reconfiguration in 10s of microseconds. NAPOLY is also parameterizable, allowing for offline generation of …


Accuracy, Cost And Performance Trade-Offs For Streaming Set-Wise Floating Point Accumulation On Fpgas, Krishna Kumar Nagar Jan 2013

Accuracy, Cost And Performance Trade-Offs For Streaming Set-Wise Floating Point Accumulation On Fpgas, Krishna Kumar Nagar

Theses and Dissertations

The set-wise summation operation is perhaps one of the most fundamental and widely used operations in scientific applications. In these applications, maintaining the accuracy of the summation is also important as floating point operations have inherent errors associated with them. Designing floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists. There have been several efforts to design floating point accumulators and accurate summation architecture using different algorithms on FPGAs but these problems have been dealt with separately. In this dissertation, we present …


Exploiting Matrix Symmetry To Improve Fpgaaccelerated Conjugate Gradient, Jason D. Bakos, Krishna K. Nagar Apr 2009

Exploiting Matrix Symmetry To Improve Fpgaaccelerated Conjugate Gradient, Jason D. Bakos, Krishna K. Nagar

Faculty Publications

In this paper we describe a new approach for accelerating the Conjugate Gradient (CG) method using an FPGA co-processor. As in previous approaches, our co-processor performs a double-precision sparse matrix-vector multiplication. However, our implementation doubles the amount of computation per unit of input data by exploiting the symmetry of the input matrix and computing the upper and lower triangle of the input matrix in parallel. Using a Virtex-2 Pro 100 FPGA, we have achieved an observed computational throughput of 1155 MFLOPS.