Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

University of Kentucky

Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 44

Full-Text Articles in Physical Sciences and Mathematics

Potential Alzheimer's Disease Plasma Biomarkers, Taylor Estepp Jan 2023

Potential Alzheimer's Disease Plasma Biomarkers, Taylor Estepp

Theses and Dissertations--Epidemiology and Biostatistics

In this series of studies, we examined the potential of a variety of blood-based plasma biomarkers for the identification of Alzheimer's disease (AD) progression and cognitive decline. With the end goal of studying these biomarkers via mixture modeling, we began with a literature review of the methodology. An examination of the biomarkers with demographics and other health factors found evidence of minimal risk of confounding along the causal pathway from biomarkers to cognitive performance. Further study examined the usefulness of linear combinations of biomarkers, achieved via partial least squares (PLS) analysis, as predictors of various cognitive assessment scores and clinical …


Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan Jan 2023

Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan

Theses and Dissertations--Statistics

Neural networks have experienced widespread adoption and have become integral in cutting-edge domains like computer vision, natural language processing, and various contemporary fields. However, addressing the statistical aspects of neural networks has been a persistent challenge, with limited satisfactory results. In my research, I focused on exploring statistical intervals applied to neural networks, specifically confidence intervals and tolerance intervals. I employed variance estimation methods, such as direct estimation and resampling, to assess neural networks and their performance under outlier scenarios. Remarkably, when outliers were present, the resampling method with infinitesimal jackknife estimation yielded confidence intervals that closely aligned with nominal …


Deriving The Distributions And Developing Methods Of Inference For R2-Type Measures, With Applications To Big Data Analysis, Gregory S. Hawk Jan 2022

Deriving The Distributions And Developing Methods Of Inference For R2-Type Measures, With Applications To Big Data Analysis, Gregory S. Hawk

Theses and Dissertations--Statistics

As computing capabilities and cloud-enhanced data sharing has accelerated exponentially in the 21st century, our access to Big Data has revolutionized the way we see data around the world, from healthcare to investments to manufacturing to retail and supply-chain. In many areas of research, however, the cost of obtaining each data point makes more than just a few observations impossible. While machine learning and artificial intelligence (AI) are improving our ability to make predictions from datasets, we need better statistical methods to improve our ability to understand and translate models into meaningful and actionable insights.

A central goal in the …


Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi Jan 2022

Beta Mixture And Contaminated Model With Constraints And Application With Micro-Array Data, Ya Qi

Theses and Dissertations--Statistics

This dissertation research is concentrated on the Contaminated Beta(CB) model and its application in micro-array data analysis. Modified Likelihood Ratio Test (MLRT) introduced by [Chen et al., 2001] is used for testing the omnibus null hypothesis of no contamination of Beta(1,1)([Dai and Charnigo, 2008]). We design constraints for two-component CB model, which put the mode toward the left end of the distribution to reflect the abundance of small p-values of micro-array data, to increase the test power. A three-component CB model might be useful when distinguishing high differentially expressed genes and moderate differentially expressed genes. If the null hypothesis above …


Investigations Into The Genetics Of Mixed Pathologies In Dementia, Adam Dugan Jan 2021

Investigations Into The Genetics Of Mixed Pathologies In Dementia, Adam Dugan

Theses and Dissertations--Epidemiology and Biostatistics

Alzheimer’s disease (AD) is an irreversible, progressive brain disorder that leads to a loss of memory and thinking skills. While tremendous progress has been made in our understanding of the genetics underlying AD, currently known genetic variants explain only approximately 30% of the heritable risk of developing AD. One hurdle to AD research is that it can only be definitively diagnosed at autopsy, making cruder, clinic-based diagnoses more common. In recent years, several brain pathologies that mimic AD’s clinical presentation have been identified including brain arteriolosclerosis, hippocampal sclerosis (HS), and, most recently, limbic-predominant age-related TDP-43 encephalopathy (LATE). It has become …


Dimension Reduction Techniques In Regression, Pei Wang Jan 2021

Dimension Reduction Techniques In Regression, Pei Wang

Theses and Dissertations--Statistics

Because of the advances of modern technology, the size of the collected data nowadays is larger and the structure is more complex. To deal with such kinds of data, sufficient dimension reduction (SDR) and reduced rank (RR) regression are two powerful tools. This dissertation focuses on these two tools and it is composed of three projects. In the first project, we introduce a new SDR method through a novel approach of feature filter to recover the central mean subspace exhaustively along with a method to determine the dimension, two variable selection methods, and extensions to multivariate response and large p …


Nonparametric Analysis Of Clustered And Multivariate Data, Yue Cui Jan 2020

Nonparametric Analysis Of Clustered And Multivariate Data, Yue Cui

Theses and Dissertations--Statistics

In this dissertation, we investigate three distinct but interrelated problems for nonparametric analysis of clustered data and multivariate data in pre-post factorial design.

In the first project, we propose a nonparametric approach for one-sample clustered data in pre-post intervention design. In particular, we consider the situation where for some clusters all members are only observed at either pre or post intervention but not both. This type of clustered data is referred to us as partially complete clustered data. Unlike most of its parametric counterparts, we do not assume specific models for data distributions, intra-cluster dependence structure or variability, in effect …


Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li Jan 2020

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li

Theses and Dissertations--Statistics

Comparing the distribution of biomarker measurements between two groups under either an unpaired or paired design is a common goal in many biomarker studies. However, analyzing biomarker data is sometimes challenging because the data may not be normally distributed and contain a large fraction of zero values or missing values. Although several statistical methods have been proposed, they either require data normality assumption, or are inefficient. We proposed a novel two-part semiparametric method for data under an unpaired setting and a nonparametric method for data under a paired setting. The semiparametric method considers a two-part model, a logistic regression for …


Measuring Change: Prediction Of Early Onset Sepsis, Aric Schadler Jan 2020

Measuring Change: Prediction Of Early Onset Sepsis, Aric Schadler

Theses and Dissertations--Statistics

Sepsis occurs in a patient when an infection enters into the blood stream and spreads throughout the body causing a cascading response from the immune system. Sepsis is one of the leading causes of morbidity and mortality in today’s hospitals. This is despite published and accepted guidelines for timely and appropriate interventions for septic patients. The largest barrier to applying these interventions is the early identification of septic patients. Early identification and treatment leads to better outcomes, shorter lengths of stay, and financial savings for healthcare institutions. In order to increase the lead time in recognizing patients trending towards septicemia …


Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich Jan 2020

Orthogonal Recurrent Neural Networks And Batch Normalization In Deep Neural Networks, Kyle Eric Helfrich

Theses and Dissertations--Mathematics

Despite the recent success of various machine learning techniques, there are still numerous obstacles that must be overcome. One obstacle is known as the vanishing/exploding gradient problem. This problem refers to gradients that either become zero or unbounded. This is a well known problem that commonly occurs in Recurrent Neural Networks (RNNs). In this work we describe how this problem can be mitigated, establish three different architectures that are designed to avoid this issue, and derive update schemes for each architecture. Another portion of this work focuses on the often used technique of batch normalization. Although found to be successful …


Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu Jan 2020

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not …


Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou Jan 2020

Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou

Theses and Dissertations--Statistics

Statistical intervals (e.g., confidence, prediction, or tolerance) are widely used to quantify uncertainty, but complex settings can create challenges to obtain such intervals that possess the desired properties. My thesis will address diverse data settings and approaches that are shown empirically to have good performance. We first introduce a focused treatment on using a single-layer bootstrap calibration to improve the coverage probabilities of two-sided parametric tolerance intervals for non-normal distributions. We then turn to zero-inflated data, which are commonly found in, among other areas, pharmaceutical and quality control applications. However, the inference problem often becomes difficult in the presence of …


Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang Jan 2020

Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang

Theses and Dissertations--Statistics

Kinetic modeling of the time dependence of metabolite concentrations including the unstable isotope labeled species is an important approach to simulate metabolic pathway dynamics. It is also essential for quantitative metabolic flux analysis using tracer data. However, as the metabolic networks are complex including extensive compartmentation and interconnections, the parameter estimation for enzymes that catalyze individual reactions needed for kinetic modeling is challenging. As the pa- rameter space is large and multi-dimensional while kinetic data are comparatively sparse, the estimation procedure (especially the point estimation methods) often en- counters multiple local maximum such that standard maximum likelihood methods may yield …


Unitary And Symmetric Structure In Deep Neural Networks, Kehelwala Dewage Gayan Maduranga Jan 2020

Unitary And Symmetric Structure In Deep Neural Networks, Kehelwala Dewage Gayan Maduranga

Theses and Dissertations--Mathematics

Recurrent neural networks (RNNs) have been successfully used on a wide range of sequential data problems. A well-known difficulty in using RNNs is the vanishing or exploding gradient problem. Recently, there have been several different RNN architectures that try to mitigate this issue by maintaining an orthogonal or unitary recurrent weight matrix. One such architecture is the scaled Cayley orthogonal recurrent neural network (scoRNN), which parameterizes the orthogonal recurrent weight matrix through a scaled Cayley transform. This parametrization contains a diagonal scaling matrix consisting of positive or negative one entries that can not be optimized by gradient descent. Thus the …


Tobacco Smoking And Dementia In A Kentucky Cohort: A Competing Risk Analysis, Erin L. Abner, Peter T. Nelson, Gregory A. Jicha, Gregory E. Cooper, David W. Fardo, Frederick A. Schmitt, Richard J. Kryscio Mar 2019

Tobacco Smoking And Dementia In A Kentucky Cohort: A Competing Risk Analysis, Erin L. Abner, Peter T. Nelson, Gregory A. Jicha, Gregory E. Cooper, David W. Fardo, Frederick A. Schmitt, Richard J. Kryscio

Epidemiology and Environmental Health Faculty Publications

Tobacco smoking was examined as a risk for dementia and neuropathological burden in 531 initially cognitively normal older adults followed longitudinally at the University of Kentucky’s Alzheimer’s Disease Center. The cohort was followed for an average of 11.5 years; 111 (20.9%) participants were diagnosed with dementia, while 242 (45.6%) died without dementia. At baseline, 49 (9.2%) participants reported current smoking (median pack-years = 47.3) and 231 (43.5%) former smoking (median pack-years = 24.5). The hazard ratio (HR) for dementia for former smokers versus never smokers based on the Cox model was 1.64 (95% CI: 1.09, 2.46), while the HR for …


Automatic 13C Chemical Shift Reference Correction Of Protein Nmr Spectral Data Using Data Mining And Bayesian Statistical Modeling, Xi Chen Jan 2019

Automatic 13C Chemical Shift Reference Correction Of Protein Nmr Spectral Data Using Data Mining And Bayesian Statistical Modeling, Xi Chen

Theses and Dissertations--Molecular and Cellular Biochemistry

Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially of biomacromolecules such as proteins. However, due to the intrinsic properties of NMR experiments, results from the NMR instruments require a refencing step before the down-the-line analysis. Poor chemical shift referencing, especially for 13C in protein Nuclear Magnetic Resonance (NMR) experiments, fundamentally limits and even prevents effective study of biomacromolecules via NMR. There is no available method that can rereference carbon chemical shifts from protein NMR without secondary experimental information such as structure or resonance assignment.

To solve this problem, we …


Serial Testing For Detection Of Multilocus Genetic Interactions, Zaid T. Al-Khaledi Jan 2019

Serial Testing For Detection Of Multilocus Genetic Interactions, Zaid T. Al-Khaledi

Theses and Dissertations--Statistics

A method to detect relationships between disease susceptibility and multilocus genetic interactions is the Multifactor-Dimensionality Reduction (MDR) technique pioneered by Ritchie et al. (2001). Since its introduction, many extensions have been pursued to deal with non-binary outcomes and/or account for multiple interactions simultaneously. Studying the effects of multilocus genetic interactions on continuous traits (blood pressure, weight, etc.) is one case that MDR does not handle. Culverhouse et al. (2004) and Gui et al. (2013) proposed two different methods to analyze such a case. In their research, Gui et al. (2013) introduced the Quantitative Multifactor-Dimensionality Reduction (QMDR) that uses the overall …


Improved Methods And Selecting Classification Types For Time-Dependent Covariates In The Marginal Analysis Of Longitudinal Data, I-Chen Chen Jan 2018

Improved Methods And Selecting Classification Types For Time-Dependent Covariates In The Marginal Analysis Of Longitudinal Data, I-Chen Chen

Theses and Dissertations--Epidemiology and Biostatistics

Generalized estimating equations (GEE) are popularly utilized for the marginal analysis of longitudinal data. In order to obtain consistent regression parameter estimates, these estimating equations must be unbiased. However, when certain types of time-dependent covariates are presented, these equations can be biased unless an independence working correlation structure is employed. Moreover, in this case regression parameter estimation can be very inefficient because not all valid moment conditions are incorporated within the corresponding estimating equations. Therefore, approaches using the generalized method of moments or quadratic inference functions have been proposed for utilizing all valid moment conditions. However, we have found that …


Modeling And Mapping Location-Dependent Human Appearance, Zachary Bessinger Jan 2018

Modeling And Mapping Location-Dependent Human Appearance, Zachary Bessinger

Theses and Dissertations--Computer Science

Human appearance is highly variable and depends on individual preferences, such as fashion, facial expression, and makeup. These preferences depend on many factors including a person's sense of style, what they are doing, and the weather. These factors, in turn, are dependent upon geographic location and time. In our work, we build computational models to learn the relationship between human appearance, geographic location, and time. The primary contributions are a framework for collecting and processing geotagged imagery of people, a large dataset collected by our framework, and several generative and discriminative models that use our dataset to learn the relationship …


Occurrence And Attributes Of Two Echinoderm-Bearing Faunas From The Upper Mississippian (Chesterian; Lower Serpukhovian) Ramey Creek Member, Slade Formation, Eastern Kentucky, U.S.A., Ann Well Harris Jan 2018

Occurrence And Attributes Of Two Echinoderm-Bearing Faunas From The Upper Mississippian (Chesterian; Lower Serpukhovian) Ramey Creek Member, Slade Formation, Eastern Kentucky, U.S.A., Ann Well Harris

Theses and Dissertations--Earth and Environmental Sciences

Well-preserved echinoderm faunas are rare in the fossil record, and when uncovered, understanding their occurrence can be useful in interpreting other faunas. In this study, two such faunas of the same age from separate localities in the shallow-marine Ramey Creek Member of the Slade Formation in the Upper Mississippian (Chesterian) rocks of eastern Kentucky are examined. Of the more than 5,000 fossil specimens from both localities, only 9–34 percent were echinoderms from 3–5 classes. Nine non-echinoderm (8 invertebrate and one vertebrate) classes occurred at both localities, but of these, bryozoans, brachiopods and sponges dominated. To understand the attributes of both …


Accounting For Matching Uncertainty In Photographic Identification Studies Of Wild Animals, Amanda R. Ellis Jan 2018

Accounting For Matching Uncertainty In Photographic Identification Studies Of Wild Animals, Amanda R. Ellis

Theses and Dissertations--Statistics

I consider statistical modelling of data gathered by photographic identification in mark-recapture studies and propose a new method that incorporates the inherent uncertainty of photographic identification in the estimation of abundance, survival and recruitment. A hierarchical model is proposed which accepts scores assigned to pairs of photographs by pattern recognition algorithms as data and allows for uncertainty in matching photographs based on these scores. The new models incorporate latent capture histories that are treated as unknown random variables informed by the data, contrasting past models having the capture histories being fixed. The methods properly account for uncertainty in the matching …


Mixtures-Of-Regressions With Measurement Error, Xiaoqiong Fang Jan 2018

Mixtures-Of-Regressions With Measurement Error, Xiaoqiong Fang

Theses and Dissertations--Statistics

Finite Mixture model has been studied for a long time, however, traditional methods assume that the variables are measured without error. Mixtures-of-regression model with measurement error imposes challenges to the statisticians, since both the mixture structure and the existence of measurement error can lead to inconsistent estimate for the regression coefficients. In order to solve the inconsistency, We propose series of methods to estimate the mixture likelihood of the mixtures-of-regressions model when there is measurement error, both in the responses and predictors. Different estimators of the parameters are derived and compared with respect to their relative efficiencies. The simulation results …


Automated Tree-Level Forest Quantification Using Airborne Lidar, Hamid Hamraz Jan 2018

Automated Tree-Level Forest Quantification Using Airborne Lidar, Hamid Hamraz

Theses and Dissertations--Computer Science

Traditional forest management relies on a small field sample and interpretation of aerial photography that not only are costly to execute but also yield inaccurate estimates of the entire forest in question. Airborne light detection and ranging (LiDAR) is a remote sensing technology that records point clouds representing the 3D structure of a forest canopy and the terrain underneath. We present a method for segmenting individual trees from the LiDAR point clouds without making prior assumptions about tree crown shapes and sizes. We then present a method that vertically stratifies the point cloud to an overstory and multiple understory tree …


Improved Standard Error Estimation For Maintaining The Validities Of Inference In Small-Sample Cluster Randomized Trials And Longitudinal Studies, Whitney Ford Tanner Jan 2018

Improved Standard Error Estimation For Maintaining The Validities Of Inference In Small-Sample Cluster Randomized Trials And Longitudinal Studies, Whitney Ford Tanner

Theses and Dissertations--Epidemiology and Biostatistics

Data arising from Cluster Randomized Trials (CRTs) and longitudinal studies are correlated and generalized estimating equations (GEE) are a popular analysis method for correlated data. Previous research has shown that analyses using GEE could result in liberal inference due to the use of the empirical sandwich covariance matrix estimator, which can yield negatively biased standard error estimates when the number of clusters or subjects is not large. Many techniques have been presented to correct this negative bias; However, use of these corrections can still result in biased standard error estimates and thus test sizes that are not consistently at their …


The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie Jan 2018

The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie

Theses and Dissertations--Statistics

When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable …


An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert Jan 2017

An Exploratory Statistical Method For Finding Interactions In A Large Dataset With An Application Toward Periodontal Diseases, Joshua Lambert

Theses and Dissertations--Epidemiology and Biostatistics

It is estimated that Periodontal Diseases effects up to 90% of the adult population. Given the complexity of the host environment, many factors contribute to expression of the disease. Age, Gender, Socioeconomic Status, Smoking Status, and Race/Ethnicity are all known risk factors, as well as a handful of known comorbidities. Certain vitamins and minerals have been shown to be protective for the disease, while some toxins and chemicals have been associated with an increased prevalence. The role of toxins, chemicals, vitamins, and minerals in relation to disease is believed to be complex and potentially modified by known risk factors. A …


Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan Jan 2017

Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan

Theses and Dissertations--Statistics

We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented.

We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses …


Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu Jan 2017

Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu

Theses and Dissertations--Statistics

Firstly, we reviewed some popular nonparameteric regression methods during the past several decades. Then we extended the compound estimation (Charnigo and Srinivasan [2011]) to adapt random design points and heteroskedasticity and proposed a modified Cp criteria for tuning parameter selection. Moreover, we developed a DCp criteria for tuning paramter selection problem in general nonparametric derivative estimation. This extends GCp criteria in Charnigo, Hall and Srinivasan [2011] with random design points and heteroskedasticity. Next, we proposed a change point detection method via compound estimation for both fixed design and random design case, the adaptation of heteroskedasticity was considered for the method. …


Extending The Latent Multinomial Model With Complex Error Processes And Dynamic Markov Bases, Simon J. Bonner, Matthew R. Schofield, Patrik Noren, Steven J. Price Jan 2016

Extending The Latent Multinomial Model With Complex Error Processes And Dynamic Markov Bases, Simon J. Bonner, Matthew R. Schofield, Patrik Noren, Steven J. Price

Forestry and Natural Resources Faculty Publications

The latent multinomial model (LMM) of Link et al. [Biometrics 66 (2010) 178–185] provides a framework for modelling mark-recapture data with potential identification errors. Key is a Markov chain Monte Carlo (MCMC) scheme for sampling configurations of the latent counts of the true capture histories that could have generated the observed data. Assuming a linear map between the observed and latent counts, the MCMC algorithm uses vectors from a basis of the kernel to move between configurations of the latent data. Schofield and Bonner [Biometrics 71 (2015) 1070–1080] shows that this is sufficient for some models within the …


Empirical Likelihood And Differentiable Functionals, Zhiyuan Shen Jan 2016

Empirical Likelihood And Differentiable Functionals, Zhiyuan Shen

Theses and Dissertations--Statistics

Empirical likelihood (EL) is a recently developed nonparametric method of statistical inference. It has been shown by Owen (1988,1990) and many others that empirical likelihood ratio (ELR) method can be used to produce nice confidence intervals or regions. Owen (1988) shows that -2logELR converges to a chi-square distribution with one degree of freedom subject to a linear statistical functional in terms of distribution functions. However, a generalization of Owen's result to the right censored data setting is difficult since no explicit maximization can be obtained under constraint in terms of distribution functions. Pan and Zhou (2002), instead, study the …