Contributions To Statistical Testing, Prediction, And Modeling, 2017 University of New Mexico
Contributions To Statistical Testing, Prediction, And Modeling, John C. Pesko
Mathematics & Statistics ETDs
1. "Parametric Bootstrap (PB) and Objective Bayesian (OB) Testing with Applications to Heteroscedastic ANOVA": For one-way heteroscedastic ANOVA, we show a close relationship between the PB and OB approaches to significance testing, demonstrating the conditions for which the two approaches are equivalent. Using a simulation study, PB and OB performance is compared to a test based on the predictive distribution as well as the unweighted test of Akritas & Papadatos (2004). We extend this work to the RCBD with subsampling model, and prove a repeated sampling property and large sample property for general OB significance testing.
2. "Early Identification of Binswanger ...
Integration Of Multi-Platform High-Dimensional Omic Data, 2016 The University of Texas Graduate School of Biomedical Sciences at Houston
Integration Of Multi-Platform High-Dimensional Omic Data, Xuebei An
UT GSBS Dissertations and Theses (Open Access)
The development of high-throughput biotechnologies have made data accessible from different platforms, including RNA sequencing, copy number variation, DNA methylation, protein lysate arrays, etc. The high-dimensional omic data derived from different technological platforms have been extensively used to facilitate comprehensive understanding of disease mechanisms and to determine personalized health treatments. Although vital to the progress of clinical research, the high dimensional multi-platform data impose new challenges for data analysis. Numerous studies have been proposed to integrate multi-platform omic data; however, few have efficiently and simultaneously addressed the problems that arise from high dimensionality and complex correlations.
In my dissertation, I ...
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, 2016 Fox Chase Cancer Center
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
COBRA Preprint Series
Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for ...
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, 2016 University of Washington - Seattle Campus
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the ...
A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, 2016 Virginia Commonwealth University
A Weighted Gene Co-Expression Network Analysis For Streptococcus Sanguinis Microarray Experiments, Erik C. Dvergsten
Theses and Dissertations
Streptococcus sanguinis is a gram-positive, non-motile bacterium native to human mouths. It is the primary cause of endocarditis and is also responsible for tooth decay. Two-component systems (TCSs) are commonly found in bacteria. In response to environmental signals, TCSs may regulate the expression of virulence factor genes.
Gene co-expression networks are exploratory tools used to analyze system-level gene functionality. A gene co-expression network consists of gene expression profiles represented as nodes and gene connections, which occur if two genes are significantly co-expressed. An adjacency function transforms the similarity matrix containing co-expression similarities into the adjacency matrix containing connection strengths. Gene ...
Development In Normal Mixture And Mixture Of Experts Modeling, 2016 University of Kentucky
Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi
Theses and Dissertations--Statistics
In this dissertation, first we consider the problem of testing homogeneity and order in a contaminated normal model, when the data is correlated under some known covariance structure. To address this problem, we developed a moment based homogeneity and order test, and design weights for test statistics to increase power for homogeneity test. We applied our test to microarray about Down’s syndrome. This dissertation also studies a singular Bayesian information criterion (sBIC) for a bivariate hierarchical mixture model with varying weights, and develops a new data dependent information criterion (sFLIC).We apply our model and criteria to birth- weight ...
Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, 2015 Division of Biostatistics, School of Public Health, University of California, Berkeley
Loss-Based Estimation With Cross-Validation: Applications To Microarray Data Analysis And Motif Finding, Sandrine Dudoit, Mark J. Van Der Laan, Sunduz Keles, Annette M. Molinaro, Sandra E. Sinisi, Siew Leng Teng
Mark J. van der Laan
Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable ...
Transcriptomic Analyses Of Onecut1 And Onecut2 Deficient Retinas, 2015 Iowa State University
Transcriptomic Analyses Of Onecut1 And Onecut2 Deficient Retinas, Jillian J. Goetz, Jeffrey M. Trimarchi
Genetics, Development and Cell Biology Publications
In this article, we further explore the data generated for the research article “Onecut1 and Onecut2 play critical roles in the development of the mouse retina”. To better understand the functionality of the Onecut family of transcription factors in retinogenesis, we investigated the retinal transcriptomes of developing and mature mice to identify genes with differential expression. This data article reports the full transcriptomes resulting from these experiments and provides tables detailing the differentially expressed genes between wildtype and Onecut1 or 2 deficient retinas. The raw array data of our transcriptomes as generated using Affymetrix microarrays are available on the NCBI ...
Methods For Integrative Analysis Of Genomic Data, 2014 Virginia Commonwealth University
Methods For Integrative Analysis Of Genomic Data, Paul Manser
Theses and Dissertations
In recent years, the development of new genomic technologies has allowed for the investigation of many regulatory epigenetic marks besides expression levels, on a genome-wide scale. As the price for these technologies continues to decrease, study sizes will not only increase, but several different assays are beginning to be used for the same samples. It is therefore desirable to develop statistical methods to integrate multiple data types that can handle the increased computational burden of incorporating large data sets. Furthermore, it is important to develop sound quality control and normalization methods as technical errors can compound when integrating multiple genomic ...
Normal Mixture And Contaminated Model With Nuisance Parameter And Applications, 2014 University of Kentucky
Normal Mixture And Contaminated Model With Nuisance Parameter And Applications, Qian Fan
Theses and Dissertations--Statistics
This paper intend to find the proper hypothesis and test statistic for testing existence of bilaterally contamination when there exists nuisance parameter. The test statistic is based on method of moments estimators. Union-Intersection test is used for testing if the distribution of population can be implemented by a bilaterally contaminated normal model with unknown variance. This paper also developed a hierarchical normal mixture model (HNM) and applied it to birth weight data. EM algorithm is employed for parameter estimation and a singular Bayesian information criterion (sBIC) is applied to choose the number components. We also proposed a singular flexible information ...
Contaminated Chi-Square Modeling And Its Application In Microarray Data Analysis, 2014 University of Kentucky
Contaminated Chi-Square Modeling And Its Application In Microarray Data Analysis, Feng Zhou
Theses and Dissertations--Statistics
Mixture modeling has numerous applications. One particular interest is microarray data analysis. My dissertation research is focused on the Contaminated Chi-Square (CCS) Modeling and its application in microarray. A moment-based method and two likelihood-based methods including Modified Likelihood Ratio Test (MLRT) and Expectation-Maximization (EM) Test are developed for testing the omnibus null hypothesis of no contamination of a central chi-square distribution by a non-central Chi-Square distribution. When the omnibus null hypothesis is rejected, we further developed the moment-based test and the EM test for testing an extra component to the Contaminated Chi-Square (CCS+EC) Model. The moment-based approach is easy ...
A Two-Step Hierarchical Hypothesis Set Testing Framework, With Applications To Gene Expression Data On Ordered Categories, 2013 university of colorado denver
A Two-Step Hierarchical Hypothesis Set Testing Framework, With Applications To Gene Expression Data On Ordered Categories, Yihan Li, Debashis Ghosh
BACKGROUND: In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems.
RESULTS: We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries ...
Meta-Analysis Based On Weighted Ordered P-Values For Genomic Data With Heterogeneity, 2013 university of colorado denver
Meta-Analysis Based On Weighted Ordered P-Values For Genomic Data With Heterogeneity, Yihan Li, Debashis Ghosh
BACKGROUND: Meta-analysis has become increasingly popular in recent years, especially in genomic data analysis, due to the fast growth of available data and studies that target the same questions. Many methods have been developed, including classical ones such as Fisher's combined probability test and Stouffer's Z-test. However, not all meta-analyses have the same goal in mind. Some aim at combining information to find signals in at least one of the studies, while others hope to find more consistent signals across the studies. While many classical meta-analysis methods are developed with the former goal in mind, the latter goal ...
Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, 2013 The University of Texas MD Anderson Cancer Center
Bayesian Joint Selection Of Genes And Pathways: Applications In Multiple Myeloma Genomics, Lin Zhang, Jeffrey S. Morris, Jiexin Zhang, Robert Orlowski, Veerabhadran Baladandayuthapani
Jeffrey S. Morris
It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associ- ated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct ...
Integrative Biomarker Identification And Classification Using High Throughput Assays, 2013 The University of Texas Graduate School of Biomedical Sciences at Houston
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
UT GSBS Dissertations and Theses (Open Access)
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays ...
Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, 2013 The University of Texas M.D. Anderson Cancer Center
Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi
Jeffrey S. Morris
Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.
Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.
Design: A single-arm, phase II trial.
Patients: Twenty-seven patients with FAP.
Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.
Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were ...
Integrative Analysis Of Prognosis Data On Multiple Cancer Subtypes, 2012 Yale University
Integrative Analysis Of Prognosis Data On Multiple Cancer Subtypes, Shuangge Ma
In cancer research, profiling studies have been extensively conducted, searching for genes/SNPs associated with prognosis. Cancer is diverse. Examining similarity and difference in the genetic basis of multiple subtypes of the same cancer can lead to a better understanding of their connections and distinctions. Classic meta-analysis methods analyze each subtype separately and then compare analysis results across subtypes. Integrative analysis methods, in contrast, analyze the raw data on multiple subtypes simultaneously and can outperform meta-analysis methods. In this study, prognosis data on multiple subtypes of the same cancer are analyzed. An AFT (accelerated failure time) model is adopted to ...
Bayesian Methods For Expression-Based Integration, 2012 Texas A&M University
Bayesian Methods For Expression-Based Integration, Elizabeth M. Jennings, Jeffrey S. Morris, Raymond J. Carroll, Ganiraju C. Manyam, Veera Baladandayuthapani
Jeffrey S. Morris
We propose methods to integrate data across several genomic platforms using a hierarchical Bayesian analysis framework that incorporates the biological relationships among the platforms to identify genes whose expression is related to clinical outcomes in cancer. This integrated approach combines information across all platforms, leading to increased statistical power in finding these predictive genes, and further provides mechanistic information about the manner in which the gene affects the outcome. We demonstrate the advantages of the shrinkage estimation used by this approach through a simulation, and finally, we apply our method to a Glioblastoma Multiforme dataset and identify several genes potentially ...
A Bayesian Model For Pooling Gene Expression Studies That Incorporates Co-Regulation Information, 2012 University of Massachusetts - Amherst
A Bayesian Model For Pooling Gene Expression Studies That Incorporates Co-Regulation Information, Erin M. Conlon, Bradley L. L. Postier, Barbara A. Methé, Kelly P. Nevin, Derek R. Lovley
Erin M. Conlon
Current Bayesian microarray models that pool multiple studies assume gene expression is independent of other genes. However, in prokaryotic organisms, genes are arranged in units that are co-regulated (called operons). Here, we introduce a new Bayesian model for pooling gene expression studies that incorporates operon information into the model. Our Bayesian model borrows information from other genes within the same operon to improve estimation of gene expression. The model produces the gene-specific posterior probability of differential expression, which is the basis for inference. We found in simulations and in biological studies that incorporating co-regulation information improves upon the independence model ...
Identification Of Biologically Relevant Subtypes Via Preweighted Sparse Clustering, 2012 University of North Carolina at Chapel Hill
Identification Of Biologically Relevant Subtypes Via Preweighted Sparse Clustering, Sheila Gaynor, Eric Bair
The University of North Carolina at Chapel Hill Department of Biostatistics Technical Report Series
Cluster analysis methods are used to identify homogeneous subgroups in a data set. Frequently one applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest. Conventional clustering methods often fail to identify such subgroups, particularly when there are a large number of high-variance features in the data set. Conventional methods may identify clusters associated with these high-variance features when one wishes to obtain secondary clusters that are more interesting biologically or more strongly associated with a particular outcome of interest. We describe a ...