Engineering | Open Access Articles | Digital Commons Network™

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in …

Go to article

A Hybrid Mpi-Openmp Strategy To Speedup The Compression Of Big Next-Generation Sequencing Datasets, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

DNA sequencing has moved into the realm of Big Data due to the rapid development of high-throughput, low cost Next-Generation Sequencing (NGS) technologies. Sequential data compression solutions that once were sufficient to efficiently store and distribute this information are now falling behind. In this paper we introduce phyNGSC, a hybrid MPI-OpenMP strategy to speedup the compression of big NGS data by combining the features of both distributed and shared memory architectures. Our algorithm balances work-load among processes and threads, alleviates memory latency by exploiting locality, and accelerates I/O by reducing excessive read/write operations and inter-node message exchange. To make the …

Go to article

Gpu-Pcc: A Gpu Based Technique To Compute Pairwise Pearson’S Correlation Coefficients For Big Fmri Data, Taban Eslami, Muaaz Gul Awan, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Functional Magnetic Resonance Imaging (fMRI) is a non-invasive brain imaging technique for studying the brain’s functional activities. Pearson’s Correlation Coefficient is an important measure for capturing dynamic behaviors and functional connectivity between brain components. One bottleneck in computing Correlation Coefficients is the time it takes to process big fMRI data. In this paper, we propose GPU-PCC, a GPU based algorithm based on vector dot product, which is able to compute pairwise Pearson’s Correlation Coefficients while performing computation once for each pair. Our method is able to compute Correlation Coefficients in an ordered fashion without the need to do post-processing reordering …

Go to article

An Out-Of-Core Gpu Based Dimensionality Reduction Algorithm For Big Mass Spectrometry Data And Its Application In Bottom-Up Proteomics, Muaaz Awan, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and …

Go to article

Engineering Commons^™

Full-Text Articles in Engineering

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

A Hybrid Mpi-Openmp Strategy To Speedup The Compression Of Big Next-Generation Sequencing Datasets, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Gpu-Pcc: A Gpu Based Technique To Compute Pairwise Pearson’S Correlation Coefficients For Big Fmri Data, Taban Eslami, Muaaz Gul Awan, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

An Out-Of-Core Gpu Based Dimensionality Reduction Algorithm For Big Mass Spectrometry Data And Its Application In Bottom-Up Proteomics, Muaaz Awan, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports