Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Life Sciences

Ultra-Fast And Memory-Efficient Lookups For Cloud, Networked Systems, And Massive Data Management, Ye Yu Jan 2018

Ultra-Fast And Memory-Efficient Lookups For Cloud, Networked Systems, And Massive Data Management, Ye Yu

Theses and Dissertations--Computer Science

Systems that process big data (e.g., high-traffic networks and large-scale storage) prefer data structures and algorithms with small memory and fast processing speed. Efficient and fast algorithms play an essential role in system design, despite the improvement of hardware. This dissertation is organized around a novel algorithm called Othello Hashing. Othello Hashing supports ultra-fast and memory-efficient key-value lookup, and it fits the requirements of the core algorithms of many large-scale systems and big data applications. Using Othello hashing, combined with domain expertise in cloud, computer networks, big data, and bioinformatics, I developed the following applications that resolve several major …


Novel Computational Methods For Sequencing Data Analysis: Mapping, Query, And Classification, Xinan Liu Jan 2018

Novel Computational Methods For Sequencing Data Analysis: Mapping, Query, And Classification, Xinan Liu

Theses and Dissertations--Computer Science

Over the past decade, the evolution of next-generation sequencing technology has considerably advanced the genomics research. As a consequence, fast and accurate computational methods are needed for analyzing the large data in different applications. The research presented in this dissertation focuses on three areas: RNA-seq read mapping, large-scale data query, and metagenomics sequence classification.

A critical step of RNA-seq data analysis is to map the RNA-seq reads onto a reference genome. This dissertation presents a novel splice alignment tool, MapSplice3. It achieves high read alignment and base mapping yields and is able to detect splice junctions, gene fusions, and circular …


Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones Jan 2018

Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones

Theses and Dissertations--Computer Science

In order to reduce the time associated with and the costs of drug discovery, machine learning is being used to automate much of the work in this process. However the size and complex nature of molecular data makes the application of machine learning especially challenging. Much work must go into the process of engineering features that are then used to train machine learning models, costing considerable amounts of time and requiring the knowledge of domain experts to be most effective. The purpose of this work is to demonstrate data driven approaches to perform the feature selection and extraction steps in …


Automated Tree-Level Forest Quantification Using Airborne Lidar, Hamid Hamraz Jan 2018

Automated Tree-Level Forest Quantification Using Airborne Lidar, Hamid Hamraz

Theses and Dissertations--Computer Science

Traditional forest management relies on a small field sample and interpretation of aerial photography that not only are costly to execute but also yield inaccurate estimates of the entire forest in question. Airborne light detection and ranging (LiDAR) is a remote sensing technology that records point clouds representing the 3D structure of a forest canopy and the terrain underneath. We present a method for segmenting individual trees from the LiDAR point clouds without making prior assumptions about tree crown shapes and sizes. We then present a method that vertically stratifies the point cloud to an overstory and multiple understory tree …