Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

2017

Western Michigan University

Data Storage Systems

Articles 1 - 1 of 1

Full-Text Articles in Engineering

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed Oct 2017

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in …