Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

PDF

Computer Engineering

2015

Western Michigan University

Articles 1 - 3 of 3

Full-Text Articles in Engineering

Big Data Proteogenomics And High Performance Computing: Challenges And Opportunities, Fahad Saeed Oct 2015

Big Data Proteogenomics And High Performance Computing: Challenges And Opportunities, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Proteogenomics is an emerging field of systems biology research at the intersection of proteomics and genomics. Two high-throughput technologies, Mass Spectrometry (MS) for proteomics and Next Generation Sequencing (NGS) machines for genomics are required to conduct proteogenomics studies. Independently both MS and NGS technologies are inflicted with data deluge which creates problems of storage, transfer, analysis and visualization. Integrating these big data sets (NGS+MS) for proteogenomics studies compounds all of the associated computational problems. Existing sequential algorithms for these proteogenomics datasets analysis are inadequate for big data and high performance computing (HPC) solutions are almost non-existent. The purpose of this …


A Parallel Algorithm For Compression Of Big Next-Generation Sequencing Datasets, Sandino N. Vargas Perez, Fahad Saeed Aug 2015

A Parallel Algorithm For Compression Of Big Next-Generation Sequencing Datasets, Sandino N. Vargas Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

With the advent of high-throughput next-generation sequencing (NGS) techniques, the amount of data being generated represents challenges including storage, analysis and transport of huge datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. However, these specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress gigabytes of data. In this paper we introduce paraDSRC, a parallel implementation of DSRC algorithm using a message passing model that presents reduction of the compression time complexity by a factor of O(1/p ). Our experimental results show …


Design And Implementation Of Network Transfer Protocol For Big Genomic Data, Mohammed Aledhari, Fahad Saeed Jan 2015

Design And Implementation Of Network Transfer Protocol For Big Genomic Data, Mohammed Aledhari, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Genomic data is growing exponentially due to next generation sequencing technologies (NGS) and their ability to produce massive amounts of data in a short time. NGS technologies generate big genomic data that needs to be exchanged between different locations efficiently and reliably. The current network transfer protocols rely on Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) protocols, ignoring data size and type. Universal application layer protocols such as HTTP are designed for wide variety of data types and are not particularly efficient for genomic data. Therefore, we present a new data-aware transfer protocol for genomic-data that increases network …