Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong Oct 2020

Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong

Theses and Dissertations

Analyzing population representative datasets for local estimation and predictions over time is important for monitoring related public health issues, however, there are many statistical challenges associated with such analyses. Mixed effect models are one of the common options which can incorporate time and spatial effect in the model and related inference is well established.

In the first part of this dissertation, to estimate area-level prevalence using individuallevel data, small area estimation (SAE) with post-stratified mixed effect models were used where sampling weights were also incorporated into it. However, if poststratification which requires more computation effort can improve estimation accuracy is …


Estimation And Inference Under Model Uncertainty, Yizheng Wei Oct 2020

Estimation And Inference Under Model Uncertainty, Yizheng Wei

Theses and Dissertations

Chapter 1 of this dissertation proposes a consistent and locally efficient estimator to estimate the model parameters for a logistic mixed effect model with random slopes. Our approach relaxes two typical assumptions: the random effects being normally distributed, and the covariates and random effects being independent of each other. Adhering to these assumptions is particularly difficult in health studies where in many cases we have limited resources to design experiments and gather data in long-term studies, while new findings from other fields might emerge, suggesting the violation of such assumptions. So it is crucial if we could have an estimator …


Categorical And Fuzzy Ensemble-Based Algorithms For Cluster Analysis, Bridget Nicole Manning Oct 2020

Categorical And Fuzzy Ensemble-Based Algorithms For Cluster Analysis, Bridget Nicole Manning

Theses and Dissertations

This dissertation focuses on improving multivariate methods of cluster analysis. In Chapter 3 we discuss methods relevant to the categorical clustering of tertiary data while Chapter 4 considers the clustering of quantitative data using ensemble algorithms. Lastly, in Chapter 5, future research plans are discussed to investigate the clustering of spatial binary data.

Cluster analysis is an unsupervised methodology whose results may be influenced by the types of variables recorded on observations. When dealing with the clustering of categorical data, solutions produced may not accurately reflect the structure of the process that generated them. Increased variability within the latent structure …


Network-Based Statistical Analysis Of Functional Magnetic Resonance Imaging Data From Aphasia Patients, Xingpei Zhao Jul 2020

Network-Based Statistical Analysis Of Functional Magnetic Resonance Imaging Data From Aphasia Patients, Xingpei Zhao

Theses and Dissertations

Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that provides insight into brain function and activity. Network models of fMRI signals can reveal functional connectivity related to certain brain disorders, such as post-stroke aphasia. This thesis aims to identify the functional connections that distinguish anomic and Broca’s aphasia by comparing the resting-state fMRI from the patients with these two types of aphasia. The network-based statistic (NBS) approach is used to detect such connections. After the analytic pipeline is applied to the fMRI data, the NBS approach identifies a distinct subnetwork between the two types of aphasia, which involves the …


High-Dimensional Inference Based On The Leave-One-Covariate-Out Regularization Path, Xiangyang Cao Jul 2020

High-Dimensional Inference Based On The Leave-One-Covariate-Out Regularization Path, Xiangyang Cao

Theses and Dissertations

The increasingly rapid emergence of high dimensional data, where the number of variables p may be larger than the sample size n, has necessitated the development of new statistical methodologies. LASSO and variants of LASSO are proposed and have been the most popular estimators for the high dimensional regression models. However, not much work has focused on analyzing and summarizing the information contained in the entire solution path of the LASSO. This dissertation consists of three research projects that propose and extend the Leave-One-Covariate-Out(LOCO) solution path statistic to regression and graphical models.

In the first chapter, we propose a new …


The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith Jul 2020

The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith

Theses and Dissertations

The Cox proportional hazards model is the most common regression technique for survival analysis. However, the proportional hazards assumption restricts it’s use to a limited group of multiplicative models. Laplace regression is a flexible quantile regression technique for censored observations that is appropriate in a wider variety of applications as compared to the Cox proportional hazards model. Instead of estimating a hazard ratio, Laplace regression which is free from a proportionality assumption, can be used to estimate many adjusted percentiles of survival time allowing for a more complete description of the association of interest. This paper compares the performance of …


Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang Jul 2020

Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang

Theses and Dissertations

Datasets with a relatively large number of zeros is commonly seen in medical applications. Although models like Zero-inflated Poisson (ZIP) model are proposed for counts data, there is still some issues with ordinal data which have excess zeros. In this paper, we developed a Bayesian approach to accommodate the excess zero in ordinal data. Intellectual disability (ID), also known as mental retardation (MR), is a disability characterized by below-average intelligence or mental ability and a lack of the learning necessary skills for daily life. A person with intellectual disability has intellectual functioning and adaptive behaviors limitations. Intellectual disability is a …


Semiparametric Regression Analysis Of Survival Data And Panel Count Data, Lu Wang Jul 2020

Semiparametric Regression Analysis Of Survival Data And Panel Count Data, Lu Wang

Theses and Dissertations

Both censored survival data and panel count data arise commonly in real-life studies in many fields such as epidemiology, social science, and medical research. In these studies, subjects are usually examined multiple times at periodical or irregular follow-up examinations. Censored data are studied when the exact failure times of the events are of interest but not all of these exact times are directly observed. Some of the failure times of event of interest are only known to fall within some intervals formed by the observation times. Panel count data are under investigation when the exact times of the recurrent events …


Bayesian Analysis Of Binary Diagnostic Tests And Panel Count Data, Chunling Wang Apr 2020

Bayesian Analysis Of Binary Diagnostic Tests And Panel Count Data, Chunling Wang

Theses and Dissertations

This dissertation mainly explores several challenging topics that arise in diagnostic tests and panel count data in the Bayesian framework. Binary diagnostic tests, particularly multiple diagnostic tests with repeated measures and diagnostic procedures with a large number of raters, are studied. For panel count data, most traditional methods only handle panel count data for a single type of recurrent event. In this dissertation, we primarily focus on the case with multiple types of recurrent events.

In Chapter 1, an introduction to the binary diagnostic tests data and panel count data is presented and related literature works are briefly reviewed. To …


Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain Apr 2020

Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain

Theses and Dissertations

The joint modeling of longitudinal and time-to-event data is an active area of statistical research that has received a lot of attention. The standard joint models, referred to as univariate joint models, allow simultaneous modeling of a single longitudinal outcome and a single time-to-event under an assumption of independent censoring. The majority of the joint modeling research in the last two decades has focused on extending and improving the univariate joint models. While many of the practical applications involve data on multivariate longitudinal outcomes and multiple timeto- events possibly informatively censored by some other terminal time-to-event, the developments of joint …


Preparing For The Future: The Effects Of Financial Literacy On Financial Planning For Young Professionals, Tanay Singh Apr 2020

Preparing For The Future: The Effects Of Financial Literacy On Financial Planning For Young Professionals, Tanay Singh

Senior Theses

Purpose – Many people between the age of 20 and 34 have not considered planning financially for the future in any significant capacity and in doing so, they limit their potential savings. The purpose of this study is to examine what financial expectations are for people in the early stages of their career and determine if improving financial literacy and revealing financial realities helps to produce more accurate or realistic expectations. Ultimately, the goal is to better prepare participants in the study for the working world and increased responsibilities outside of the college/university environment by getting them to start thinking …


Studies Of Group Fused Lasso And Probit Model For Right-Censored Data, Tuan Quoc Do Apr 2020

Studies Of Group Fused Lasso And Probit Model For Right-Censored Data, Tuan Quoc Do

Theses and Dissertations

This document is composed of three main chapters. In the first chapter, we study the mixture of experts, a powerful machine learning model in which each expert handles a different region of the covariate space. However, it is crucial to choose an appropriate number of experts to avoid overfitting or underfitting. A group fused lasso (GFL) term is added to the model with the goal of making the coefficients of the experts and the gating network closer together. An algorithm to optimize the problem is also developed using block-wise coordinate descent in the dual counterpart. Numerical results on simulated and …


Flexible Regression Models For Survival Data, Ennan Gu Apr 2020

Flexible Regression Models For Survival Data, Ennan Gu

Theses and Dissertations

Survival analysis is a branch of statistics to analyze the time-to-event data or survival data. One important feature of survival data is censoring, which means that not all the subjects’ survival time are observed directly. Among all the survival data, right-censored data are the most common type and consist of some exactly observed survival times and some right-censored observations. In this dissertation, we focus on studying flexible regression models for complicated right-censored survival data when the classical proportional hazards (PH) assumption is not satisfied. Flexible semiparametric regression models can largely avoid misspecification of parametric distributions and thus provide more modeling …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …