Physical Sciences and Mathematics | Open Access Articles

Measuring Variability In Model Performance Measures, Matthew Rutledge Jan 2020

Measuring Variability In Model Performance Measures, Matthew Rutledge

Theses and Dissertations--Statistics

As data become increasingly available, statisticians are confronted with both larger sample sizes and larger numbers of predictors. While both of these factors are beneficial in building better predictive models and allowing for better inference, models can become difficult to interpret and often include variables of little practical significance. This dissertation provides methods that assist model builders to better understand and select from a collection of candidate models. We study the asymptotic distribution of AIC and propose a graphical tool to assist practitioners in comparing and contrasting candidate models. Real-world examples show how this graphic might be used and a …

Go to article

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu Jan 2020

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not …

Go to article

Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou Jan 2020

Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou

Theses and Dissertations--Statistics

Statistical intervals (e.g., confidence, prediction, or tolerance) are widely used to quantify uncertainty, but complex settings can create challenges to obtain such intervals that possess the desired properties. My thesis will address diverse data settings and approaches that are shown empirically to have good performance. We first introduce a focused treatment on using a single-layer bootstrap calibration to improve the coverage probabilities of two-sided parametric tolerance intervals for non-normal distributions. We then turn to zero-inflated data, which are commonly found in, among other areas, pharmaceutical and quality control applications. However, the inference problem often becomes difficult in the presence of …

Go to article

Algebraic And Geometric Properties Of Hierarchical Models, Aida Maraj Jan 2020

Algebraic And Geometric Properties Of Hierarchical Models, Aida Maraj

Theses and Dissertations--Mathematics

In this dissertation filtrations of ideals arising from hierarchical models in statistics related by a group action are are studied. These filtrations lead to ideals in polynomial rings in infinitely many variables, which require innovative tools. Regular languages and finite automata are used to prove and explicitly compute the rationality of some multivariate power series that record important quantitative information about the ideals. Some work regarding Markov bases for non-reducible models is shown, together with advances in the polyhedral geometry of binary hierarchical models.

Go to article

Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang Jan 2020

Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang

Theses and Dissertations--Statistics

Kinetic modeling of the time dependence of metabolite concentrations including the unstable isotope labeled species is an important approach to simulate metabolic pathway dynamics. It is also essential for quantitative metabolic flux analysis using tracer data. However, as the metabolic networks are complex including extensive compartmentation and interconnections, the parameter estimation for enzymes that catalyze individual reactions needed for kinetic modeling is challenging. As the pa- rameter space is large and multi-dimensional while kinetic data are comparatively sparse, the estimation procedure (especially the point estimation methods) often en- counters multiple local maximum such that standard maximum likelihood methods may yield …

Go to article

Moment Kernels For T-Central Subspace, Weihang Ren Jan 2020

Moment Kernels For T-Central Subspace, Weihang Ren

Theses and Dissertations--Statistics

The T-central subspace allows one to perform sufficient dimension reduction for any statistical functional of interest. We propose a general estimator using a third moment kernel to estimate the T-central subspace. In particular, in this dissertation we develop sufficient dimension reduction methods for the central mean subspace via the regression mean function and central subspace via Fourier transform, central quantile subspace via quantile estimator and central expectile subsapce via expectile estima- tor. Theoretical results are established and simulation studies show the advantages of our proposed methods.

Go to article

Simultaneous Tolerance Intervals For Response Surface And Mixture Designs Using The Adjusted Product Set Method, Aisaku Nakamura Jan 2020

Simultaneous Tolerance Intervals For Response Surface And Mixture Designs Using The Adjusted Product Set Method, Aisaku Nakamura

Theses and Dissertations--Statistics

Various methods for constructing simultaneous tolerance intervals for regression models have been developed over the years, but all of them can be shown to be conservative. In this thesis, extensive simulations are conducted to evaluate the degree of conservatism with respect to their coverage probabilities. A new strategy to fit simultaneous tolerance intervals on linear models is proposed by modifying an existing method, which we call the adjusted product set (APS) method. The APS method will also be used to construct simultaneous tolerance bands on response surface and mixture designs.

Go to article

Unitary And Symmetric Structure In Deep Neural Networks, Kehelwala Dewage Gayan Maduranga Jan 2020

Unitary And Symmetric Structure In Deep Neural Networks, Kehelwala Dewage Gayan Maduranga

Theses and Dissertations--Mathematics

Recurrent neural networks (RNNs) have been successfully used on a wide range of sequential data problems. A well-known difficulty in using RNNs is the vanishing or exploding gradient problem. Recently, there have been several different RNN architectures that try to mitigate this issue by maintaining an orthogonal or unitary recurrent weight matrix. One such architecture is the scaled Cayley orthogonal recurrent neural network (scoRNN), which parameterizes the orthogonal recurrent weight matrix through a scaled Cayley transform. This parametrization contains a diagonal scaling matrix consisting of positive or negative one entries that can not be optimized by gradient descent. Thus the …

Go to article

Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich Jan 2020

Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich

Theses and Dissertations--Mathematics

Despite the recent success of various machine learning techniques, there are still numerous obstacles that must be overcome. One obstacle is known as the vanishing/exploding gradient problem. This problem refers to gradients that either become zero or unbounded. This is a well known problem that commonly occurs in Recurrent Neural Networks (RNNs). In this work we describe how this problem can be mitigated, establish three different architectures that are designed to avoid this issue, and derive update schemes for each architecture. Another portion of this work focuses on the often used technique of batch normalization. Although found to be successful …

Go to article

Nonparametric Analysis Of Clustered And Multivariate Data, Yue Cui Jan 2020

Nonparametric Analysis Of Clustered And Multivariate Data, Yue Cui

Theses and Dissertations--Statistics

In this dissertation, we investigate three distinct but interrelated problems for nonparametric analysis of clustered data and multivariate data in pre-post factorial design.

In the first project, we propose a nonparametric approach for one-sample clustered data in pre-post intervention design. In particular, we consider the situation where for some clusters all members are only observed at either pre or post intervention but not both. This type of clustered data is referred to us as partially complete clustered data. Unlike most of its parametric counterparts, we do not assume specific models for data distributions, intra-cluster dependence structure or variability, in effect …

Go to article

Cancer Phylogenetic Analysis Based On Rna-Seq Data, Tingting Zhai Jan 2020

Cancer Phylogenetic Analysis Based On Rna-Seq Data, Tingting Zhai

Theses and Dissertations--Statistics

Studying tumor evolution is a major task to understand the biological mechanism of carcinogenesis, develop new cancer therapies, and prevent drug resistance. We focus on two important questions in tumor evolution. The first question is to quantify intra-tumor heterogeneity, where multiple subclones of tumor cells with distinct transcriptomic profiles. Another question is to estimate the temporal order of alteration of key cancer pathways during tumor evolution. We present a new statistical method to 1) reconstruct the evolutionary history and population frequency of the subclonal lineages of tumor cells and 2) infer temporal order of pathway alterations in tumor evolution for …

Go to article

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li Jan 2020

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li

Theses and Dissertations--Statistics

Comparing the distribution of biomarker measurements between two groups under either an unpaired or paired design is a common goal in many biomarker studies. However, analyzing biomarker data is sometimes challenging because the data may not be normally distributed and contain a large fraction of zero values or missing values. Although several statistical methods have been proposed, they either require data normality assumption, or are inefficient. We proposed a novel two-part semiparametric method for data under an unpaired setting and a nonparametric method for data under a paired setting. The semiparametric method considers a two-part model, a logistic regression for …

Go to article

Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu Jan 2020

Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu

Theses and Dissertations--Statistics

The Bayesian adjustment for confounding (BAC) is a Bayesian model averaging method to select and adjust for confounding factors when evaluating the average causal effect of an exposure on a certain outcome. We extend the BAC method to time-to-event outcomes. Specifically, the posterior distribution of the exposure effect on a time-to-event outcome is calculated as a weighted average of posterior distributions from a number of candidate proportional hazards models, weighing each model by its ability to adjust for confounding factors. The Bayesian Information Criterion based on the partial likelihood is used to compare different models and approximate the Bayes factor. …

Go to article

Measuring Change: Prediction Of Early Onset Sepsis, Aric Schadler Jan 2020

Measuring Change: Prediction Of Early Onset Sepsis, Aric Schadler

Theses and Dissertations--Statistics

Sepsis occurs in a patient when an infection enters into the blood stream and spreads throughout the body causing a cascading response from the immune system. Sepsis is one of the leading causes of morbidity and mortality in today’s hospitals. This is despite published and accepted guidelines for timely and appropriate interventions for septic patients. The largest barrier to applying these interventions is the early identification of septic patients. Early identification and treatment leads to better outcomes, shorter lengths of stay, and financial savings for healthcare institutions. In order to increase the lead time in recognizing patients trending towards septicemia …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Measuring Variability In Model Performance Measures, Matthew Rutledge

Theses and Dissertations--Statistics

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou

Theses and Dissertations--Statistics

Algebraic And Geometric Properties Of Hierarchical Models, Aida Maraj

Theses and Dissertations--Mathematics

Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang

Theses and Dissertations--Statistics

Moment Kernels For T-Central Subspace, Weihang Ren

Theses and Dissertations--Statistics

Simultaneous Tolerance Intervals For Response Surface And Mixture Designs Using The Adjusted Product Set Method, Aisaku Nakamura

Theses and Dissertations--Statistics

Unitary And Symmetric Structure In Deep Neural Networks, Kehelwala Dewage Gayan Maduranga

Theses and Dissertations--Mathematics

Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich

Theses and Dissertations--Mathematics

Nonparametric Analysis Of Clustered And Multivariate Data, Yue Cui

Theses and Dissertations--Statistics

Cancer Phylogenetic Analysis Based On Rna-Seq Data, Tingting Zhai

Theses and Dissertations--Statistics

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li

Theses and Dissertations--Statistics

Estimation Of The Treatment Effect With Bayesian Adjustment For Covariates, Li Xu

Theses and Dissertations--Statistics

Measuring Change: Prediction Of Early Onset Sepsis, Aric Schadler

Theses and Dissertations--Statistics