Computational Engineering | Open Access Articles

Power-Efficient And Highly Scalable Parallel Graph Sampling Using Fpgas, Usman Tariq, Umer Cheema, Fahad Saeed Oct 2017

Power-Efficient And Highly Scalable Parallel Graph Sampling Using Fpgas, Usman Tariq, Umer Cheema, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Energy efficiency is a crucial problem in data centers where big data is generally represented by directed or undirected graphs. Analysis of this big data graph is challenging due to volume and velocity of the data as well as irregular memory access patterns. Graph sampling is one of the most effective ways to reduce the size of graph while maintaining crucial characteristics. In this paper we present design and implementation of an FPGA based graph sampling method which is both time- and energy-efficient. This is in contrast to existing parallel approaches which include memory-distributed clusters, multicore and GPUs. Our …

Go to article

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed Oct 2017

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in …

Go to article

Gpu-Arraysort: A Parallel, In-Place Algorithm For Sorting Large Number Of Arrays, Muaaz Awan, Fahad Saeed Aug 2016

Gpu-Arraysort: A Parallel, In-Place Algorithm For Sorting Large Number Of Arrays, Muaaz Awan, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Modern day analytics deals with big datasets from diverse fields. For many application the data is in the form of an array which consists of large number of smaller arrays. Existing techniques focus on sorting a single large array and cannot be used for sorting large number of smaller arrays in an efficient manner. Currently no such algorithm is available which can sort such large number of arrays utilizing the massively parallel architecture of GPU devices. In this paper we present a highly scalable parallel algorithm, called GPU-ArraySort, for sorting large number of arrays using a GPU. Our algorithm performs …

Go to article

Big Data Proteogenomics And High Performance Computing: Challenges And Opportunities, Fahad Saeed Oct 2015

Big Data Proteogenomics And High Performance Computing: Challenges And Opportunities, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Proteogenomics is an emerging field of systems biology research at the intersection of proteomics and genomics. Two high-throughput technologies, Mass Spectrometry (MS) for proteomics and Next Generation Sequencing (NGS) machines for genomics are required to conduct proteogenomics studies. Independently both MS and NGS technologies are inflicted with data deluge which creates problems of storage, transfer, analysis and visualization. Integrating these big data sets (NGS+MS) for proteogenomics studies compounds all of the associated computational problems. Existing sequential algorithms for these proteogenomics datasets analysis are inadequate for big data and high performance computing (HPC) solutions are almost non-existent. The purpose of this …

Go to article

A Parallel Algorithm For Compression Of Big Next-Generation Sequencing Datasets, Sandino N. Vargas Perez, Fahad Saeed Aug 2015

A Parallel Algorithm For Compression Of Big Next-Generation Sequencing Datasets, Sandino N. Vargas Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

With the advent of high-throughput next-generation sequencing (NGS) techniques, the amount of data being generated represents challenges including storage, analysis and transport of huge datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. However, these specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress gigabytes of data. In this paper we introduce paraDSRC, a parallel implementation of DSRC algorithm using a message passing model that presents reduction of the compression time complexity by a factor of O(1/p ). Our experimental results show …

Go to article

Computational Engineering Commons^™

Full-Text Articles in Computational Engineering

Power-Efficient And Highly Scalable Parallel Graph Sampling Using Fpgas, Usman Tariq, Umer Cheema, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Gpu-Arraysort: A Parallel, In-Place Algorithm For Sorting Large Number Of Arrays, Muaaz Awan, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Big Data Proteogenomics And High Performance Computing: Challenges And Opportunities, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

A Parallel Algorithm For Compression Of Big Next-Generation Sequencing Datasets, Sandino N. Vargas Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports