Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics

Western Michigan University

Parallel Computing and Data Science Lab Technical Reports

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Life Sciences

An Out-Of-Core Gpu Based Dimensionality Reduction Algorithm For Big Mass Spectrometry Data And Its Application In Bottom-Up Proteomics, Muaaz Awan, Fahad Saeed Jan 2017

An Out-Of-Core Gpu Based Dimensionality Reduction Algorithm For Big Mass Spectrometry Data And Its Application In Bottom-Up Proteomics, Muaaz Awan, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and …


A Parallel Algorithm For Compression Of Big Next-Generation Sequencing Datasets, Sandino N. Vargas Perez, Fahad Saeed Aug 2015

A Parallel Algorithm For Compression Of Big Next-Generation Sequencing Datasets, Sandino N. Vargas Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

With the advent of high-throughput next-generation sequencing (NGS) techniques, the amount of data being generated represents challenges including storage, analysis and transport of huge datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. However, these specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress gigabytes of data. In this paper we introduce paraDSRC, a parallel implementation of DSRC algorithm using a message passing model that presents reduction of the compression time complexity by a factor of O(1/p ). Our experimental results show …