Open Access. Powered by Scholars. Published by Universities.®

Computational Biology Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Computational Biology

Adjusting For Gene-Specific Covariates To Improve Rna-Seq Analysis, Hyeongseon Jeon, Kyu-Sang Lim, Yet Nguyen, Dan Nettleton Jan 2023

Adjusting For Gene-Specific Covariates To Improve Rna-Seq Analysis, Hyeongseon Jeon, Kyu-Sang Lim, Yet Nguyen, Dan Nettleton

Mathematics & Statistics Faculty Publications

Summary

This paper suggests a novel positive false discovery rate (pFDR) controlling method for testing gene-specific hypotheses using a gene-specific covariate variable, such as gene length. We suppose the null probability depends on the covariate variable. In this context, we propose a rejection rule that accounts for heterogeneity among tests by employing two distinct types of null probabilities. We establish a pFDR estimator for a given rejection rule by following Storey's q-value framework. A condition on a type 1 error posterior probability is provided that equivalently characterizes our rejection rule. We also present a suitable procedure for selecting a tuning …


Improved Radiation Expression Profiling In Blood By Sequential Application Of Sensitive And Specific Gene Signatures, Eliseos J. Mucaki, Ben C. Shirley, Peter K. Rogan Oct 2021

Improved Radiation Expression Profiling In Blood By Sequential Application Of Sensitive And Specific Gene Signatures, Eliseos J. Mucaki, Ben C. Shirley, Peter K. Rogan

Biochemistry Publications

Purpose. Combinations of expressed genes can discriminate radiation-exposed from normal control blood samples by machine learning based signatures (with 8 to 20% misclassification rates). These signatures can quantify therapeutically-relevant as well as accidental radiation exposures. The prodromal symptoms of Acute Radiation Syndrome (ARS) overlap those present in Influenza and Dengue Fever infections. Surprisingly, these human radiation signatures misclassified gene expression profiles of virally infected samples as false positive exposures. The present study investigates these and other confounders, and then mitigates their impact on signature accuracy.

Methods. This study investigated recall by previous and novel radiation signatures independently derived …


Characterization Of The Growth Factor Receptor Network Oncogenes In Lung Cancer, Ashley Duche Aug 2021

Characterization Of The Growth Factor Receptor Network Oncogenes In Lung Cancer, Ashley Duche

Pharmaceutical Sciences (MS) Theses

Lung cancer remains the leading cause of cancer related deaths worldwide, reportedly contributing to 1.8 million of the 10.0 million mortalities documented in the year 2020. Although advancements have been made in therapeutics and diagnostic methods, formulation of effective treatments and development of drug resistance continues to be a challenge. These challenges arise from our lack of understanding of intricate signaling pathways, such as the Growth Factor Receptor Network (GFRN), which contributes to complex lung tumor heterogeneity allowing for drug resistance development. In this study, gene expression signatures of six GFRN oncogenes overexpressed in human mammary epithelial cells (HMECs) were …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Region Based Gene Expression Via Reanalysis Of Publicly Available Microarray Data Sets., Ernur Saka May 2018

Region Based Gene Expression Via Reanalysis Of Publicly Available Microarray Data Sets., Ernur Saka

Electronic Theses and Dissertations

A DNA microarray is a high-throughput technology used to identify relative gene expression. One of the most widely used platforms is the Affymetrix® GeneChip® technology which detects gene expression levels based on probe sets composed of a set of twenty-five nucleotide probes designed to hybridize with specific gene targets. Given a particular Affymetrix® GeneChip® platform, the design of the probes is fixed. However, the method of analysis is dynamic in nature due to the ability to annotate and group probes into uniquely defined groupings. This is particularly important since publicly available repositories of microarray datasets, such as ArrayExpress and NCBI’s …


Chromatin Accessibility Dynamics In The Arabidopsis Root Epidermis And Endodermis During Cold Acclimation, Shawn Hoogstra Nov 2017

Chromatin Accessibility Dynamics In The Arabidopsis Root Epidermis And Endodermis During Cold Acclimation, Shawn Hoogstra

Electronic Thesis and Dissertation Repository

Understanding cell-type specific transcriptional responses to environmental conditions is limited by a lack of knowledge of transcriptional control due to epigenetic dynamics. Additionally, cell-type analyses are limited by difficulties in applying current technologies to single cell-types. A novel DNase-seq protocol and analysis procedure, deemed DNase-DTS, was developed to identify DHSs in the Arabidopsis epidermis and endodermis under control and cold acclimation conditions. Results identified thousands of DHSs within each cell-type and experimental condition. DHSs showed strong association to gene expression, DNA methylation, and histone modifications. A priori mapping of existing DNA binding motifs within accessible genes and the cold C-repeat/dehydration …


Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li May 2017

Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li

Dissertations & Theses (Open Access)

My dissertation is focused on quantitative methodology development and application for two important topics in translational and clinical cancer research.

The first topic was motivated by the challenge of applying transcriptome sequencing (RNA-seq) to formalin-fixation and paraffin-embedding (FFPE) tumor samples for reliable diagnostic development. We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. To comprehensively evaluate the FFPE RNA-seq data quality for expression profiling, we developed multiple computational methods for assessment, such as the uniformity and continuity …


A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im Aug 2016

A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im

Heather Wheeler

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates ‘imputed’ gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys …


Genome-Wide Detection And Analysis Of Multifunctional Genes, Yuri Pritykin, Dario Ghersi, Mona Singh Oct 2015

Genome-Wide Detection And Analysis Of Multifunctional Genes, Yuri Pritykin, Dario Ghersi, Mona Singh

Interdisciplinary Informatics Faculty Publications

Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, …


A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Gtex Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im Sep 2015

A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Gtex Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im

Bioinformatics Faculty Publications

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates ‘imputed’ gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys …


P53 Maintains Hepatic Cell Identity During Liver Regeneration, Zeynep Hande Coban Akdemir May 2014

P53 Maintains Hepatic Cell Identity During Liver Regeneration, Zeynep Hande Coban Akdemir

Dissertations & Theses (Open Access)

p53 MAINTAINS HEPATIC CELL IDENTITY DURING LIVER REGENERATION

Zeynep Hande Coban Akdemir, B.S.,M.A.

Advisory Professor: Michelle Craig Barton, Ph.D.

p53 is a tumor suppressor that has been well studied in tumor-derived, cultured cells. However, its functions in normal proliferating cells and tissues are generally overlooked. We propose that p53 functions during the G1-S transition can be studied in normal, differentiated cells during surgery-induced liver regeneration. Two-thirds partial hepatectomy (PH) of mouse liver offers a unique model to compare p53 functions in regenerating versus sham (control) cells. My hypothesis is that intersection of global expression analyses (microarray and RNA sequencing) and …


Linear Methods For Analysis And Quality Control Of Relative Expression Ratios From Quantitative Real-Time Polymerase Chain Reaction Experiments, Robert B. Page, Arnold J. Stromberg Jan 2011

Linear Methods For Analysis And Quality Control Of Relative Expression Ratios From Quantitative Real-Time Polymerase Chain Reaction Experiments, Robert B. Page, Arnold J. Stromberg

Biology Faculty Publications

Relative expression quantitative real-time polymerase chain reaction (RT-qPCR) experiments are a common means of estimating transcript abundances across biological groups and experimental treatments. One of the most frequently used expression measures that results from such experiments is the relative expression ratio (RE), which describes expression in experimental samples (i.e., RNA isolated from organisms, tissues, and/or cells that were exposed to one or more experimental or nonbaseline condition) in terms of fold change relative to calibrator samples (i.e., RNA isolated from organisms, tissues, and/or cells that were exposed to a control or baseline condition). Over the past decade, several …


Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur Apr 2006

Selecting 'Significant' Differentially Expressed Genes From The Combined Perspective Of The Null And The Alternative, Beatrijs Moerkerke, Els Goetghebeur

Harvard University Biostatistics Working Paper Series

No abstract provided.


Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan Jan 2005

Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper, we provide an overview of existing partitioning and hierarchical clustering algorithms in R. We discuss statistical issues and methods in choosing the number of clusters, the choice of clustering algorithm, and the choice of dissimilarity matrix. In particular, we illustrate how the bootstrap can be employed as a statistical method in cluster analysis to establish the reproducibility of the clusters and the overall variability of the followed procedure. We also show how to visualize a clustering result by plotting ordered dissimilarity matrices in R. We present a new R package, hopach, which implements the hybrid clustering method, …


Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman May 2004

Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman

Bioconductor Project Working Papers

The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in …


Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh Feb 2004

Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous. In this article, we outline a general framework for determining …


Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Jul 2001

Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as …