Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Entire DC Network

Similarities And Differences Between Variants Called With Human Reference Genome Hg19 Or Hg38, Bohu Pan, Rebecca Kusko, Wenming Xiao, Yuantin Zheng, Zhichao Liu, Chunlin Xiao, Sugunadevi Sakkiah, Wenjing Guo, Ping Gong, Chaoyang Zhang, Weigong Ge, Leming Shi, Weida Tong, Huixiao Hong Mar 2019

Similarities And Differences Between Variants Called With Human Reference Genome Hg19 Or Hg38, Bohu Pan, Rebecca Kusko, Wenming Xiao, Yuantin Zheng, Zhichao Liu, Chunlin Xiao, Sugunadevi Sakkiah, Wenjing Guo, Ping Gong, Chaoyang Zhang, Weigong Ge, Leming Shi, Weida Tong, Huixiao Hong

Faculty Publications

Background: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed.

Results: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and …


Efficient Alignment Algorithms For Dna Sequencing Data, Nilesh Vinod Khiste Jan 2018

Efficient Alignment Algorithms For Dna Sequencing Data, Nilesh Vinod Khiste

Electronic Thesis and Dissertation Repository

The DNA Next Generation Sequencing (NGS) technologies produce data at a low cost, enabling their application to many ambitious fields such as cancer research, disease control, personalized medicine etc. However, even after a decade of research, the modern aligners and assemblers are far from providing efficient and error free genome alignments and assemblies respectively. This is due to the inherent nature of the genome alignment and assembly problem, which involves many complexities. Many algorithms to address this problem have been proposed over the years, but there still is a huge scope for improvement in this research space.

Many new genome …


Measuring The Human Gut Microbiome: New Tools And Non Alcoholic Fatty Liver Disease, Ruth G. Wong Jun 2016

Measuring The Human Gut Microbiome: New Tools And Non Alcoholic Fatty Liver Disease, Ruth G. Wong

Electronic Thesis and Dissertation Repository

With the advent of next generation DNA and RNA sequencing, scientists can obtain a more comprehensive snapshot of the bacterial communities on the human body (known as the `human microbiome'), leading to information about the bacterial composition, what genes are present, and what proteins are produced. The scientific community is in a phase of developing the experiments and accompanying statistical techniques to investigate the mechanisms by which the human microbiome affects health and disease. In this thesis, I explore alternatives to the standard weighted and unweighted UniFrac distance metric that measure the difference between microbiome samples. These alternative weightings allow …


Investigating Metastatic Lineage In Colorectal Cancer By Single Cell Dna Sequencing, Marco Leung May 2016

Investigating Metastatic Lineage In Colorectal Cancer By Single Cell Dna Sequencing, Marco Leung

Dissertations & Theses (Open Access)

Metastasis is the primary cause of human cancer deaths. Patients with metastatic colorectal cancer (mCRC) show only an 11% 5-year survival rate, compared to those without local or distant metastases (92% 5-year survival rate). Understanding the CRC tumor evolution may provide valuable insights on how to improve treatment in patients with mCRC. However, the genomic basis of metastasis has been difficult to study, in part due to the extensive intratumor heterogeneity at both the primary and metastatic tumor sites, and the low frequency of subclones with metastatic potential. Previous studies have applied conventional bulk next-generation sequencing (NGS) methods, which have …


The Evolution Of Thermotolerance A Characterization Of A Directionally Evolved Cyanobacterium, Nathen Emil Bopp Nov 2015

The Evolution Of Thermotolerance A Characterization Of A Directionally Evolved Cyanobacterium, Nathen Emil Bopp

Masters Theses

Chaperone proteins are essential components in the maintenance and turnover of the proteome. Many chaperones play integral functions in the folding and unfolding of cellular substrates under many conditions, including heat stress. Most chaperones can be characterized into two categories; the typical ATP dependent chaperones and the ATP independent chaperones. One ATP independent chaperone class it the Small Heat Shock Proteins (sHSPs), which as molecular life vests and are thought to protect misfolding proteins from irreversible aggregation. One such organism, the cyanobacterium Synechocystis sp. PCC 6803, is an excellent model for the study and understanding of these proteins and their …


Prokaryotic Diversity In The Rhizosphere Of Organic, Intensive, And Transitional Coffee Farms In Brazil, Adam Caldwell, Livia Silva, Cynthia Da Silva, Cleber Ouverney Jun 2015

Prokaryotic Diversity In The Rhizosphere Of Organic, Intensive, And Transitional Coffee Farms In Brazil, Adam Caldwell, Livia Silva, Cynthia Da Silva, Cleber Ouverney

Faculty Publications, Biological Sciences

Despite a continuous rise in consumption of coffee over the past 60 years and recent studies showing positive benefits linked to human health, intensive coffee farming practices have been associated with environmental damage, risks to human health, and reductions in biodiversity. In contrast, organic farming has become an increasingly popular alternative, with both environmental and health benefits. This study aimed to characterize and determine the differences in the prokaryotic soil microbiology of three Brazilian coffee farms: one practicing intensive farming, one practicing organic farming, and one undergoing a transition from intensive to organic practices. Soil samples were collected from 20 …


Translesion Synthesis And Mutations: On The Mutagenic Properties Of The Two Dna Lesions, 8-Oxo-G And Pt-Gg, And The Functions Of Y-Family Dna Polymerases And Rev3l On The Bypass Of Each Of The Dna Lesions In Mammalian Cells, Lizhen Guo Apr 2015

Translesion Synthesis And Mutations: On The Mutagenic Properties Of The Two Dna Lesions, 8-Oxo-G And Pt-Gg, And The Functions Of Y-Family Dna Polymerases And Rev3l On The Bypass Of Each Of The Dna Lesions In Mammalian Cells, Lizhen Guo

Electronic Thesis and Dissertation Repository

I studied the capabilities of the two DNA lesions 8-oxo-guanine and cisplatin intrastrand crosslinked 1,2-d(GpG) or Pt-GG to cause mutations in mammalian cells. Using isogenic cell lines generated from mice with selective gene knockouts of distinct DNA polymerases as models, I deduced the biological functions of the translesion DNA polymerases Pol eta, Pol kappa, Pol iota, Rev1 and Rev3L on bypassing each of the lesions 8-oxo-G and Pt-GG. My study takes advantage of the Next Generation Sequencing (NGS) technology to determine mutagenic effects of the DNA lesions in vivo and effects of translesion DNA polymerases on bypassing the lesions. Through …


Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula Apr 2015

Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula

Mathematics & Statistics Theses & Dissertations

It is essential to determine the protein-DNA binding sites to understand many biological processes. A transcription factor is a particular type of protein that binds to DNA and controls gene regulation in living organisms. Chromatin immunoprecipitation followed by highthroughput sequencing (ChIP-seq) is considered the gold standard in locating these binding sites and programs use to identify DNA-transcription factor binding sites are known as peak-callers. ChIP-seq data are known to exhibit considerable background noise and other biases. In this study, we propose a negative binomial model (NB), a zero-inflated Poisson model (ZIP) and a zero-inflated negative binomial model (ZINB) for peak-calling. …


Cancer Risk Prediction With Next Generation Sequencing Data Using Machine Learning, Nihir Patel Jan 2015

Cancer Risk Prediction With Next Generation Sequencing Data Using Machine Learning, Nihir Patel

Theses

The use of computational biology for next generation sequencing (NGS) analysis is rapidly increasing in genomics research. However, the effectiveness of NGS data to predict disease abundance is yet unclear. This research investigates the problem in the whole exome NGS data of the chronic lymphocytic leukemia (CLL) available at dbGaP. Initially, raw reads from samples are aligned to the human reference genome using burrows wheeler aligner. From the samples, structural variants, namely, Single Nucleotide Polymorphism (SNP) and Insertion Deletion (INDEL) are identified and are filtered using SAMtools as well as with Genome Analyzer Tool Kit (GATK). Subsequently, the variants are …


Mapping The Human Vasculature By In Vivo Phage Display, Julianna Bronk Aug 2014

Mapping The Human Vasculature By In Vivo Phage Display, Julianna Bronk

Dissertations & Theses (Open Access)

In vivo phage display screenings by intravenous injection of a random phage-displayed peptide library allow for the selection of peptides that localize to specific vascular beds. At the University of Texas MD Anderson Cancer Center, we have had the opportunity to perform phage display screenings in cancer patients in order to select for cancer specific targets directly in humans. These targets serve to define biochemical diversity of endothelial cell surfaces and can be validated and explored towards the design of vascular-targeted pharmacology. In the most recent patient screen, samples were recovered from hepatocellular carcinoma (HCC) as well as 26 additional …


Node-Oriented Workflow (Now): A Command Template Workflow Management Tool For High Throughput Data Analysis Pipelines, Eric B. Lipsky, Brian R. King, Gerard Tromp Jun 2014

Node-Oriented Workflow (Now): A Command Template Workflow Management Tool For High Throughput Data Analysis Pipelines, Eric B. Lipsky, Brian R. King, Gerard Tromp

Faculty Journal Articles

Next generation sequencing (NGS) systems produce vast quantities of data that require substantial computational resources for typical analysis tasks. In addition, data that are generated by different NGS systems are not homogeneous. Moreover, there are an overwhelming number of tools available for performing typical tasks. Managing NGS workflows involves writing custom scripts that quickly grow in complexity, often resulting in unwieldy workflows that underutilize typical high performance compute resources, and increase the demands of the staff managing these workflows. We present Node-Oriented Workflow (NOW), a dynamic command template workflow engine for high performance distributed computing (HPC) systems. Our system provides …