Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

FPGA

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 61 - 82 of 82

Full-Text Articles in Computer Engineering

System Designs To Perform Bioinformatics Sequence Alignment, Çağlar Yilmaz, Mustafa Gök Jan 2013

System Designs To Perform Bioinformatics Sequence Alignment, Çağlar Yilmaz, Mustafa Gök

Turkish Journal of Electrical Engineering and Computer Sciences

The emerging field of bioinformatics uses computing as a tool to understand biology. Biological data of organisms (nucleotide and amino acid sequences) are stored in databases that contain billions of records. In order to process the vast amount of data in a reasonable time, high-performance analysis systems are developed. The main operation shared by the analysis tools is the search for matching patterns between sequences of data (sequence alignment). In this paper, we present 2 systems that can perform pairwise and multiple sequence alignment operations. Through the optimized design methods, proposed systems achieve up to 3.6 times more performance compared …


Accuracy, Cost And Performance Trade-Offs For Streaming Set-Wise Floating Point Accumulation On Fpgas, Krishna Kumar Nagar Jan 2013

Accuracy, Cost And Performance Trade-Offs For Streaming Set-Wise Floating Point Accumulation On Fpgas, Krishna Kumar Nagar

Theses and Dissertations

The set-wise summation operation is perhaps one of the most fundamental and widely used operations in scientific applications. In these applications, maintaining the accuracy of the summation is also important as floating point operations have inherent errors associated with them. Designing floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists. There have been several efforts to design floating point accumulators and accurate summation architecture using different algorithms on FPGAs but these problems have been dealt with separately. In this dissertation, we present …


Hardware-Software Co-Design, Acceleration And Prototyping Of Control Algorithms On Reconfigurable Platforms, Desta Kumsa Edosa Dec 2012

Hardware-Software Co-Design, Acceleration And Prototyping Of Control Algorithms On Reconfigurable Platforms, Desta Kumsa Edosa

UNLV Theses, Dissertations, Professional Papers, and Capstones

Differential equations play a significant role in many disciplines of science and engineering. Solving and implementing Ordinary Differential Equations (ODEs) and partial Differential Equations (PDEs) effectively are very essential as most complex dynamic systems are modeled based on these equations. High Performance Computing (HPC) methodologies are required to compute and implement complex and data intensive applications modeled by differential equations at higher speed. There are, however, some challenges and limitations in implementing dynamic system, modeled by non-linear ordinary differential equations, on digital hardware. Modeling an integrator involves data approximation which results in accuracy error if data values are not considered …


Air: Adaptive Dynamic Precision Iterative Refinement, Jun Kyu Lee Aug 2012

Air: Adaptive Dynamic Precision Iterative Refinement, Jun Kyu Lee

Doctoral Dissertations

In high performance computing, applications often require very accurate solutions while minimizing runtimes and power consumption. Improving the ratio of the number of logic gates implementing floating point arithmetic operations to the total number of logic gates enables greater efficiency, potentially with higher performance and lower power consumption. Software executing on the fixed hardware in Von-Neuman architectures faces limitations on improving this ratio, since processors require extensive supporting logic to fetch and decode instructions while employing arithmetic units with statically defined precision. This dissertation explores novel approaches to improve computing architectures for linear system applications not only by designing application-specific …


Low-Cost Stereo Vision On An Fpga, Chris A. Murphy, Daniel Lindquist, Ann Marie Rynning, Thomas Cecil, Sarah Leavitt, Mark L. Chang Jul 2012

Low-Cost Stereo Vision On An Fpga, Chris A. Murphy, Daniel Lindquist, Ann Marie Rynning, Thomas Cecil, Sarah Leavitt, Mark L. Chang

Mark L. Chang

We present a low-cost stereo vision implementation suitable for use in autonomous vehicle applications and designed with agricultural applications in mind. This implementation utilizes the Census transform algorithm to calculate depth maps from a stereo pair of automotive-grade CMOS cameras. The final prototype utilizes commodity hardware, including a Xilinx Spartan-3 FPGA, to process 320times240 pixel images at greater than 150 frames per second and deliver them via a USB 2.0 interface.


Extending The Hybridthread Smp Model For Distributed Memory Systems, Eugene Anthony Cartwright Iii May 2012

Extending The Hybridthread Smp Model For Distributed Memory Systems, Eugene Anthony Cartwright Iii

Graduate Theses and Dissertations

Memory Hierarchy is of growing importance in system design today. As Moore's Law allows system designers to include more processors within their designs, data locality becomes a priority. Traditional multiprocessor systems on chip (MPSoC) experience difficulty scaling as the quantity of processors increases. This challenge is common behavior of memory accesses in a shared memory environment and causes a decrease in memory bandwidth as processor numbers increase. In order to provide the necessary levels of scalability, the computer architecture community has sought to decentralize memory accesses by distributing memory throughout the system. Distributed memory offers greater bandwidth due to decoupled …


Floating-Point Divide And Square Root For Efficient Fpga Implementation Of Image And Signal Processing Algorithms, Xiaojun Wang, Miriam Leeser Apr 2012

Floating-Point Divide And Square Root For Efficient Fpga Implementation Of Image And Signal Processing Algorithms, Xiaojun Wang, Miriam Leeser

Miriam Leeser

Division and square root are important operations in any high performance signal processing applications. We have implemented floating point division and square root based on Taylor series for the variable precision floating point library developed at the Reconfigurable Computing Laboratory at Northeastern. Our result shows that they are very well suited to FPGA implementations, and lead to a good tradeoff of area and latency. We implemented a floating-point K-means clustering algorithm and applied it to multispectral satellite images. The mean update is moved from host to FPGA hardware with the new fp_div module to reduce the communication between host and …


Floating Point Division And Square Root And The Applications, Xiaojun Wang, Miriam Leeser Apr 2012

Floating Point Division And Square Root And The Applications, Xiaojun Wang, Miriam Leeser

Miriam Leeser

Division and square root are important operations in many high performance signal processing applications. We have implemented floating point division and square root based on Taylor series for the variable precision floating point library developed at the Reconfigurable Computing Laboratory at Northeastern. Our result shows that they are very well suited to FPGA implementations, and lead to a good tradeoff of area and latency. We implemented a floating-point K-means clustering algorithm and applied it to multispectral satellite images. The mean update is moved from host to FPGA hardware with the new fp_div module to reduce the communication between host and …


Accelerating Pattern Recognition Algorithms On Parallel Computing Architectures, Kenneth Rice Dec 2011

Accelerating Pattern Recognition Algorithms On Parallel Computing Architectures, Kenneth Rice

All Dissertations

The move to more parallel computing architectures places more responsibility on the programmer to achieve greater performance. The programmer must now have a greater understanding of the underlying architecture and the inherent algorithmic parallelism. Using parallel computing architectures for exploiting algorithmic parallelism can be a complex task. This dissertation demonstrates various techniques for using parallel computing architectures to exploit algorithmic parallelism. Specifically, three pattern recognition (PR) approaches are examined for acceleration across multiple parallel computing architectures, namely field programmable gate arrays (FPGAs) and general purpose graphical processing units (GPGPUs).
Phase-only filter correlation for fingerprint identification was studied as the first …


Towards Securing Virtualization Using A Reconfigurable Platform, Tushar Janefalkar May 2011

Towards Securing Virtualization Using A Reconfigurable Platform, Tushar Janefalkar

All Theses

Virtualization is no longer limited to main stream processors and servers. Virtualization software for General Purpose Processors (GPP) that allow one Operating System (OS) to run as an application in another OS have become commonplace. To exploit the full potential of the available hardware, virtualization is now prevalent across all systems big and small. Besides GPPs, state-of-the-art embedded processors are now capable of running rich operating systems and their virtualization is now a hot topic of research. However, this technological progress also opens doors for attackers to snoop on data that is not only confined to storage servers but also …


Field-Programmable Gate Array Implementation Of A Scalable Integral Image Architecture Based On Systolic Arrays, Juan Alberto De La Cruz May 2011

Field-Programmable Gate Array Implementation Of A Scalable Integral Image Architecture Based On Systolic Arrays, Juan Alberto De La Cruz

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The integral image representation of an image is important for a large number of modern image processing algorithms. Integral image representations can reduce computation and increase the operating speed of certain algorithms, improving real-time performance. Due to increasing demand for real-time image processing performance, an integral image architecture capable of accelerating the calculation based on the amount of available resources is presented. Use of the proposed accelerator allows for subsequent stages of a design to have data sooner and execute in parallel. It is shown here how, with some additional resources used in the Field Programmable Gate Array (FPGA), a …


An Fpga Based Implementation Of The Exact Stochastic Simulation Algorithm, Phani Bharadwaj Vanguri Dec 2010

An Fpga Based Implementation Of The Exact Stochastic Simulation Algorithm, Phani Bharadwaj Vanguri

Masters Theses

Mathematical and statistical modeling of biological systems is a desired goal for many years. Many biochemical models are often evaluated using a deterministic approach, which uses differential equations to describe the chemical interactions. However, such an approach is inaccurate for small species populations as it neglects the discrete representation of population values, presents the possibility of negative populations, and does not represent the stochastic nature of biochemical systems. The Stochastic Simulation Algorithm (SSA) developed by Gillespie is able to properly account for these inherent noise fluctuations. Due to the stochastic nature of the Monte Carlo simulations, large numbers of simulations …


Design Of An Adaptable Run-Time Reconfigurable Software-Defined Radio Processing Architecture, Joshua R. Templin Dec 2010

Design Of An Adaptable Run-Time Reconfigurable Software-Defined Radio Processing Architecture, Joshua R. Templin

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Processing power is a key technical challenge holding back the development of a high-performance software defined radio (SDR). Traditionally, SDR has utilized digital signal processors (DSPs), but increasingly complex algorithms, higher data rates, and multi-tasking needs have exceed the processing capabilities of modern DSPs. Reconfigurable computers, such as field-programmable gate arrays (FPGAs), are popular alternatives because of their performance gains over software for streaming data applications like SDR. However, FPGAs have not yet realized the ideal SDR because architectures have not fully utilized their partial reconfiguration (PR) capabilities to bring needed flexibility. A reconfigurable processor architecture is proposed that utilizes …


Acceleration Of Biomolecular Simulations Using Fpga-Based Reconfigurable Computing, Ananth Nallamuthu May 2010

Acceleration Of Biomolecular Simulations Using Fpga-Based Reconfigurable Computing, Ananth Nallamuthu

All Theses

A paradigm shift is occurring in the way compute-intensive scientific applications are developed. Thanks to advancements in commercially viable hybrid architectures for High-Performance Computing (HPC), the focus has shifted from improving performance by merely scaling algorithms on von Neumann computing nodes to fully exploiting additional computational capabilities provided by accelerators such as FPGAs (Field Programmable Gate Arrays) and GPGPUs (General Purpose Graphical Processing Units).
Computational chemists use Molecular Dynamics (MD) simulations like LAMMPS (Large Scale Atomic Molecular Massively Parallel Systems) and NAMD (NAnoscale Molecular Dynamics) to simulate biomolecular behaviour such as protein folding and small molecule docking to proteins. MD …


Accelerated Frame Data Relocation On Xilinx Field Programmable Gate Array, Ramachandra Kallam May 2010

Accelerated Frame Data Relocation On Xilinx Field Programmable Gate Array, Ramachandra Kallam

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Emerging reconfiguration techniques that include partial dynamic reconfiguration and partial bitstream relocation have been addressed in the past in order to expose the flexibility of field programmable gate array at runtime. Partial bitstream relocation is a technique used to target a partial bitstream of a partial reconfigurable region (PRR) onto other identical reconfigurable regions inside an FPGA, while partial dynamic reconfiguration is used to target a single reconfigurable region. Prior works in this domain aim to minimize "relocation time" with the help of on-chip or on-line processing. In this thesis, a novel PRR-PRR relocation algorithm is proposed and implemented both …


Hardware Certification For Real-Time Safety-Critical Systems: State Of The Art, Andrew J. Kornecki, Janusz Zalewski Jan 2010

Hardware Certification For Real-Time Safety-Critical Systems: State Of The Art, Andrew J. Kornecki, Janusz Zalewski

Department of Electrical Engineering and Computer Science - Daytona Beach

This paper discusses issues related to the RTCA document DO-254 Design Assurance Guidance for Airborne Electronic Hardware and its consequences for hardware certification. In particular, problems related to circuits’ compliance with DO-254 in avionics and other industries are considered. Extensive literature review of the subject is given, including current views on and experiences of chip manufacturers and EDA industry with qualification of hardware design tools, including formal approaches to hardware verification. Some results of the authors’ own study on tool qualification are presented.


New Approach Fpga-Based Implementation Of Discontinuous Svpwm}, Tole Sutikno, Auzani Jidin, Nik Rumzi Nik Idris Jan 2010

New Approach Fpga-Based Implementation Of Discontinuous Svpwm}, Tole Sutikno, Auzani Jidin, Nik Rumzi Nik Idris

Turkish Journal of Electrical Engineering and Computer Sciences

The discontinuous space vector pulse-width modulation (DSVPWM) is a well-known technique offering lower switching losses than continuous SVPWM. At the same, average switching frequency, or a switching frequency 1.5 times higher than utilized in continuous SVPWM, the discontinuous SVPWM results in lower current harmonic distortions than that obtained in continuous SVPWM at high modulation indices. This paper is concerned with the design and realization of new FPGA approach based a 5-segment discontinuous SVPWM operated at 40 kHz switching frequency. It will be shown that the implementation of the discontinuous SVPWM utilized in FPGA, to execute some complex tasks, is simplified …


Memory Architecture Template For Fast Block Matching Algorithms On Field Programmable Gate Arrays, Shant Chandrakar Dec 2009

Memory Architecture Template For Fast Block Matching Algorithms On Field Programmable Gate Arrays, Shant Chandrakar

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Fast Block Matching (FBM) algorithms for video compression are well suited for acceleration using parallel data-path architectures on Field Programmable Gate Arrays (FPGAs). However, designing an efficient on-chip memory subsystem to provide the required throughput to this parallel data-path architecture is a complex problem. This thesis presents a memory architecture template that can be parameterized for a given FBM algorithm, number of parallel Processing Elements (PEs), and block size. The template can be parameterized with well known exploration techniques to design efficient on-chip memory subsystems. The memory subsystems are derived for two existing FBM algorithms and are implemented on a …


Exploiting Matrix Symmetry To Improve Fpgaaccelerated Conjugate Gradient, Jason D. Bakos, Krishna K. Nagar Apr 2009

Exploiting Matrix Symmetry To Improve Fpgaaccelerated Conjugate Gradient, Jason D. Bakos, Krishna K. Nagar

Faculty Publications

In this paper we describe a new approach for accelerating the Conjugate Gradient (CG) method using an FPGA co-processor. As in previous approaches, our co-processor performs a double-precision sparse matrix-vector multiplication. However, our implementation doubles the amount of computation per unit of input data by exploiting the symmetry of the input matrix and computing the upper and lower triangle of the input matrix in parallel. Using a Virtex-2 Pro 100 FPGA, we have achieved an observed computational throughput of 1155 MFLOPS.


Application Specific Customization And Scalability Of Soft Multiprocessors, Deepak C. Unnikrishnan Jan 2009

Application Specific Customization And Scalability Of Soft Multiprocessors, Deepak C. Unnikrishnan

Masters Theses 1911 - February 2014

Soft multiprocessor systems exploit the plentiful computational resources available in field programmable devices. By virtue of their adaptability and ability to support coarse grained parallelism, they serve as excellent platforms for rapid prototyping and design space exploration of embedded multiprocessor applications. As complex applications emerge, careful mapping, processor and interconnect customization are critical to the overall performance of the multiprocessor system. In this thesis, we have developed an automated scalable framework to efficiently map applications written in a high-level programmer-friendly language to customizable soft-cores. The framework allows the user to specify the application in a high-level language called Streamit. After …


Using System-On-A-Programmable-Chip Technology To Design Embedded Systems, Tyson S. Hall, James O. Hamblen Sep 2006

Using System-On-A-Programmable-Chip Technology To Design Embedded Systems, Tyson S. Hall, James O. Hamblen

Faculty Works

This paper describes the tools, techniques, and devices used to design embedded products with system–on-a-chip (SoC) type solutions using a large Field Programmable Gate Array (FPGA) with an internal processor core. This new FPGA-based approach is called system-on-a-programmable-chip (SoPC ). The performance tradeoffs present in SoPC systems is compared to more traditional design approaches. Commercial devices, processor cores, and CAD tool flows are described.

The issues in SoPC hardware/software design tradeoffs are examined and three example SoPC designs are presented as case studies.


A Field Programmable Gate Array Architecture For Two-Dimensional Partial Reconfiguration, Fei Wang Jan 2006

A Field Programmable Gate Array Architecture For Two-Dimensional Partial Reconfiguration, Fei Wang

Browse all Theses and Dissertations

Reconfigurable machines can accelerate many applications by adapting to their needs through hardware reconfiguration. Partial reconfiguration allows the reconfiguration of a portion of a chip while the rest of the chip is busy working on tasks. Operating system models have been proposed for partially reconfigurable machines to handle the scheduling and placement of tasks. They are called OS4RC in this dissertation. The main goal of this research is to address some problems that come from the gap between OS4RC and existing chip architectures and the gap between OS4RC models and practical applications. Some existing OS4RC models are based on an …