Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 60 of 106

Full-Text Articles in Physical Sciences and Mathematics

Semiparametric Regression Analysis Of Survival Data And Panel Count Data, Lu Wang Jul 2020

Semiparametric Regression Analysis Of Survival Data And Panel Count Data, Lu Wang

Theses and Dissertations

Both censored survival data and panel count data arise commonly in real-life studies in many fields such as epidemiology, social science, and medical research. In these studies, subjects are usually examined multiple times at periodical or irregular follow-up examinations. Censored data are studied when the exact failure times of the events are of interest but not all of these exact times are directly observed. Some of the failure times of event of interest are only known to fall within some intervals formed by the observation times. Panel count data are under investigation when the exact times of the recurrent events …


Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang Jul 2020

Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang

Theses and Dissertations

Datasets with a relatively large number of zeros is commonly seen in medical applications. Although models like Zero-inflated Poisson (ZIP) model are proposed for counts data, there is still some issues with ordinal data which have excess zeros. In this paper, we developed a Bayesian approach to accommodate the excess zero in ordinal data. Intellectual disability (ID), also known as mental retardation (MR), is a disability characterized by below-average intelligence or mental ability and a lack of the learning necessary skills for daily life. A person with intellectual disability has intellectual functioning and adaptive behaviors limitations. Intellectual disability is a …


The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith Jul 2020

The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith

Theses and Dissertations

The Cox proportional hazards model is the most common regression technique for survival analysis. However, the proportional hazards assumption restricts it’s use to a limited group of multiplicative models. Laplace regression is a flexible quantile regression technique for censored observations that is appropriate in a wider variety of applications as compared to the Cox proportional hazards model. Instead of estimating a hazard ratio, Laplace regression which is free from a proportionality assumption, can be used to estimate many adjusted percentiles of survival time allowing for a more complete description of the association of interest. This paper compares the performance of …


Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain Apr 2020

Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain

Theses and Dissertations

The joint modeling of longitudinal and time-to-event data is an active area of statistical research that has received a lot of attention. The standard joint models, referred to as univariate joint models, allow simultaneous modeling of a single longitudinal outcome and a single time-to-event under an assumption of independent censoring. The majority of the joint modeling research in the last two decades has focused on extending and improving the univariate joint models. While many of the practical applications involve data on multivariate longitudinal outcomes and multiple timeto- events possibly informatively censored by some other terminal time-to-event, the developments of joint …


Studies Of Group Fused Lasso And Probit Model For Right-Censored Data, Tuan Quoc Do Apr 2020

Studies Of Group Fused Lasso And Probit Model For Right-Censored Data, Tuan Quoc Do

Theses and Dissertations

This document is composed of three main chapters. In the first chapter, we study the mixture of experts, a powerful machine learning model in which each expert handles a different region of the covariate space. However, it is crucial to choose an appropriate number of experts to avoid overfitting or underfitting. A group fused lasso (GFL) term is added to the model with the goal of making the coefficients of the experts and the gating network closer together. An algorithm to optimize the problem is also developed using block-wise coordinate descent in the dual counterpart. Numerical results on simulated and …


Flexible Regression Models For Survival Data, Ennan Gu Apr 2020

Flexible Regression Models For Survival Data, Ennan Gu

Theses and Dissertations

Survival analysis is a branch of statistics to analyze the time-to-event data or survival data. One important feature of survival data is censoring, which means that not all the subjects’ survival time are observed directly. Among all the survival data, right-censored data are the most common type and consist of some exactly observed survival times and some right-censored observations. In this dissertation, we focus on studying flexible regression models for complicated right-censored survival data when the classical proportional hazards (PH) assumption is not satisfied. Flexible semiparametric regression models can largely avoid misspecification of parametric distributions and thus provide more modeling …


Bayesian Analysis Of Binary Diagnostic Tests And Panel Count Data, Chunling Wang Apr 2020

Bayesian Analysis Of Binary Diagnostic Tests And Panel Count Data, Chunling Wang

Theses and Dissertations

This dissertation mainly explores several challenging topics that arise in diagnostic tests and panel count data in the Bayesian framework. Binary diagnostic tests, particularly multiple diagnostic tests with repeated measures and diagnostic procedures with a large number of raters, are studied. For panel count data, most traditional methods only handle panel count data for a single type of recurrent event. In this dissertation, we primarily focus on the case with multiple types of recurrent events.

In Chapter 1, an introduction to the binary diagnostic tests data and panel count data is presented and related literature works are briefly reviewed. To …


Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero Oct 2019

Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero

Theses and Dissertations

This thesis discusses time series analysis of weather data in South Carolina for the last fifteen years (January 2003 to December 2017) for Columbia, Greenville and North Myrtle Beach. The first part presents a brief overview of different variables that are used in the analysis. That is, temperature, dew point, humidity and sea level pressure. A short discussion of time series data is also introduced. The second part is about modeling the variables. The models of choice are presented, fitted and model diagnostics is carried out. In the third part, we discuss background on climates of the cities and model …


Statistical Analysis Of Interval-Censored Data Subject To Additional Complications, Qiang Zheng Jul 2019

Statistical Analysis Of Interval-Censored Data Subject To Additional Complications, Qiang Zheng

Theses and Dissertations

Survival analysis is an important branch of statistics that studies time to event data (or survival data), in which the response variable is time to a certain event of interest. The most prominent feature of survival data is that the response is not exactly observed due to limits of the study design or nature of the event of interest. Interval-censored data are a common type of survival data and occur frequently in real life studies where subjects are examined at periodical follow ups. The response time is usually not observed, but the status of the event of interest is known …


Estimation Problems For Pooled Data, Xichen Mou Jul 2019

Estimation Problems For Pooled Data, Xichen Mou

Theses and Dissertations

In epidemiological applications, individual specimens (e.g., blood, urine, etc.) are often pooled together to detect the presence of disease or to measure the concentration level of a specific biomarker. Due to the advantage of cost efficiency, pooled data are also seen in diverse areas such as genetics, animal ecology, and environmental science. With pooled data, individual observations are masked and new statistical methods are needed to estimate characteristics such as disease prevalence, the underlying density function of a biomarker, etc. We focus on three estimation problems for pooled data. Chapters 2 and 3 propose nonparametric estimators for the density function …


Multivariate Probit Models For Interval-Censored Failure Time Data, Yifan Zhang Jul 2019

Multivariate Probit Models For Interval-Censored Failure Time Data, Yifan Zhang

Theses and Dissertations

Survival analysis is an important branch of statistics that analyzes the time to event data. The events of interest can be death, disease occurrence, the failure of a machine part, etc.. One important feature of this type of data is censoring: information on time to event is not observed exactly due to loss to follow-up or non-occurrence of interested event before the trial ends. Censored data are commonly observed in clinical trials and epidemiological studies, since monitoring a person’s health over time after treatment is often required in medical or health studies. In this dissertation we focus on studying multivariate …


Extension Of Risk-Based Measure Of Time-Varying Prognostic Discrimination For Survival Models, Shujie Chen Jul 2019

Extension Of Risk-Based Measure Of Time-Varying Prognostic Discrimination For Survival Models, Shujie Chen

Theses and Dissertations

The Cox proportional hazards (PH) model and time dependent PH model are the most popular survival models in survival analysis. The hazard discrimination summary HDS(t) proposed by Liang and Heagerty [2017] is used to evaluate the mean hazard difference between cases and controls at time t. Liang and Heagerty [2017] evaluated the discrimination performance under the PH model and time dependent PH model with right censoring.

In this thesis, first, we further investigate their method via comprehensive simulations including 1) We extend the simulation in Liang and Heagerty [2017] under the PH model by adding more scenarios such as different …


Investigations On Multiple Interval Estimators, Taeho Kim Jul 2019

Investigations On Multiple Interval Estimators, Taeho Kim

Theses and Dissertations

Multiple interval estimation for a set of parameters is investigated. To begin, a strategy of optimization for a multiple interval estimator (MIE) is introduced. This approach allocates distinct optimized levels to individual interval estimators so that the global expected content can be minimized while the global coverage probability is still maintained at a global level. This optimal allocation is achieved by a decision theoretic procedure which consists of two global risk functions. The major part of this manuscript is devoted to two multiple interval estimation procedures. Both procedures adopt prior information added to the classical setting, but these procedures do …


Randomization Analysis Driven Software, Steph-Yves Louis Apr 2019

Randomization Analysis Driven Software, Steph-Yves Louis

Theses and Dissertations

The application of a method of randomization for a clinical trial frequently summarizes to using Simple Randomization. Even though the latter method provides favorable characteristics, if the collected sample is not large enough, it still presents the highest chance of imbalance both marginally in the treatment groups and locally in terms of the covariates. Methods of Permuted Block Randomization, Urn Randomization, Stratified Permuted Block Randomization, and Minimization represent popular alternative methods that one should consider depending on the goal of the study. A comparison of the previously mentioned methods is carried to evaluate their performance with samples that are not …


Cluster Analysis Of Mixed-Mode Data, Yawei Liang Apr 2019

Cluster Analysis Of Mixed-Mode Data, Yawei Liang

Theses and Dissertations

In the modern world, data have become increasingly more complex and often contain different types of features. Two very common types of features are continuous and discrete variables. Clustering mixed-mode data, which include both continuous and discrete variables, can be done in various ways. Furthermore, a continuous variable can take any value between its minimum and maximum. Types of continuous vari- ables include bounded or unbounded normal variables, uniform variables, circular variables, etc. Discrete variables include types other than continuous variables, such as binary variables, categorical (nominal) variables, Poisson variables, etc. Difficulties in clustering mixed-mode data include handling the association …


Regression For Pooled Testing Data With Biomedical Applications, Juexin Lin Apr 2019

Regression For Pooled Testing Data With Biomedical Applications, Juexin Lin

Theses and Dissertations

Since first introduced by Dorfman in 1943, pooled testing has been widely used as a cost and time effective testing protocol in the variety of applications. This dis- sertation consists of three projects that reveal the use of pooling techniques in the disease prevention from the perspective of regression. For disease monitoring and control, individual covariates information are often of practical interest and yield meaningful interpretations. It is natural to model the outcome of interest, which can be either a disease status (binary) or a biomarker concentration index (continuous), with individual-specific covariates through a regression analysis. Chapter 2 focuses on …


Spatio-Temporal Analysis Of Precipitation And Flood Data From South Carolina, Haigang Liu Apr 2019

Spatio-Temporal Analysis Of Precipitation And Flood Data From South Carolina, Haigang Liu

Theses and Dissertations

Spatio-temporal data are everywhere: we encounter them on TV, in newspapers, on computer screens, on tablets, and on plain paper maps. As a result, researchers in di- verse areas are increasingly faced with the task of modeling geographically-referenced and temporally-correlated data. In this dissertation, we propose two different spa- tiotemporal models to capture the behavior of rainfall and flood data in the state of South Carolina.

Both models are built using a Bayesian hierarchical framework, which involves specifying the true underlying process in the first level and the spatio-temporal ran- dom effect in the second level of the hierarchy. The …


Inflated Standard Errors Of Mcmc Estimates In Irt, Dongho Shin Apr 2019

Inflated Standard Errors Of Mcmc Estimates In Irt, Dongho Shin

Theses and Dissertations

Two widely used algorithms for estimating item response theory (IRT) parameters are Markov chain Monte Carlo (MCMC) and the EM algorithm. In general, the MCMC algorithm has advantages over the EM algorithm - for example, the MCMC algorithm allows one to estimate the desired posterior distribution and also works more straightforwardly with complex IRT models. This ease of use, allows one to implement the MCMC algorithm without carefully consideration. Previous studies, Hendrix (2011) and Lee (2016), noted that the estimated standard errors from the MCMC algorithm are larger than those from the EM algorithm. Therefore, this study investigate the reason …


Mle And Bayesian Methods To Analyze Data With Missing Values Below The Limit Of Detection, Xinxin Hu Apr 2019

Mle And Bayesian Methods To Analyze Data With Missing Values Below The Limit Of Detection, Xinxin Hu

Theses and Dissertations

As pesticides are widely used in agriculture, more and more people who work at places like farm are exposed to the pesticides. According to enviroment re- searches [Villarejo; 2003; Reigart and Roberts; 1999], being exposed to some kind of pesticides like Organophosphorus (OP) insecticides has significantly effected the health of farmworkers and their family. The actual level of pesticides can be detected with some limitation for now. However, it is hard to detect when the level is below the limit of detection (LOD). Therefore, the goal of our research is to propose several different methods to analyze data …


Adjusting For Mis-Reporting In Count Data, Gelareh Rahimighazikalayeh Jan 2018

Adjusting For Mis-Reporting In Count Data, Gelareh Rahimighazikalayeh

Theses and Dissertations

Any counting system is prone to recording errors including underreporting and overreporting. Ignoring the misreporting pattern in count data can give rise to bias in the estimation of model parameters. Accordingly, Poisson, negative binomial and generalized Poisson regression have been expanded in some instances to capture reporting biases. However, to our knowledge, no program has been developed to allow users to apply all of these models when needed. In the first part of the dissertation, we review the available models for underreported counts and develop a Stata command to estimate Poisson, negative binomial and generalized Poisson regression models for underreported …


Bayesian Semiparametric Methods For Analyzing Panel Count Data, Jianhong Wang Jan 2018

Bayesian Semiparametric Methods For Analyzing Panel Count Data, Jianhong Wang

Theses and Dissertations

Panel count data commonly arise in epidemiological, social science, medical studies, in which subjects have repeated measurements on the recurrent events of interest at different observation times. Since the subjects are not under continuous monitoring, the exact times of those recurrent events are not observed but the counts of such events within the adjacent observation times are known. Panel count data can be considered as a special type of longitudinal data with a count response variable in the literature. Compared to the frequentist literature, very limited Bayesian approaches have been developed to analyze panel count data. In this dissertation, several …


Estimation Procedures For Complex Survival Models And Their Applications In Epidemiology Studies, Jie Zhou Jan 2018

Estimation Procedures For Complex Survival Models And Their Applications In Epidemiology Studies, Jie Zhou

Theses and Dissertations

In this dissertation, we aim to address three important questions in practice, which can be solved through complex survival models. The first project focuses on studying the longitudinal fitness effect on cardiovascular disease (CVD) mortality. In the second project, we study the disease-death relation between CVD and all-cause mortality and evaluate important covariate effects on the disease or death transitions. In the third project, we compare antiretroviral treatment (ART) for HIV patients and consider both treatment effect and side effect of the drugs. The first two projects are motivated by the Aerobics Center Longitudinal Study (ACLS) datasets and the third …


Goodness Of Fit Via Residual Plots In Item Response Theory, Bryonna Bowen Jan 2018

Goodness Of Fit Via Residual Plots In Item Response Theory, Bryonna Bowen

Theses and Dissertations

Goodness-of-fit criteria developed for the evaluation of item response functions have been examined by many scholars using different theories and criteria. A number of potential graphical analysis approaches, such as residual plots, have been described in literature, but have received little attention from researchers. While many tests of goodness-of-fit are available, those that incorporate the analysis of residuals may be most useful. The unmistakable presence of a pattern in the residual plot for the logistic model item response functions even when we know the model fits raises a red flag up and calls for greater analysis. This study explores different …


Semiparametric Statistical Estimation And Inference With Latent Information, Qianqian Wang Jan 2018

Semiparametric Statistical Estimation And Inference With Latent Information, Qianqian Wang

Theses and Dissertations

In Chapter 1, we predicted disease risk by transformation models in the presence of missing subgroup identifiers. When a discrete covariate defining subgroup membership is missing for some of the subjects in a study, the distribution of the outcome follows a mixture distribution of the subgroup-specific distributions. Taking into account the uncertain distribution of the group membership and the covariates, we model the relation between the disease onset time and the covariates through transformation models in each sub-population, and develop a nonparametric maximum likelihood based estimation implemented through EM algorithm along with its inference procedure. We further propose methods to …


A Rotatable Asymmetric Variable Compensation Mirt Model, Xinchu Zhao Jan 2018

A Rotatable Asymmetric Variable Compensation Mirt Model, Xinchu Zhao

Theses and Dissertations

The purpose of this study is to develop, estimate, and interpret a new variable compensation multidimensional item response theory (MIRT) model, named the Rotatable Asymmetric Variable Compensation Model (RAVCM), that allows for transformation between different correlation structures. Since the model is rotatable like the common compensatory models (CM), it is not necessary to specify or estimate the correlation of abilities to recover the model. Also, it can approximate the existing MIRT models well. In simulation, the RAVCM is shown to estimate the parameters with small error, especially when the non-compensatory model (NCM) is the true model and the correlation of …


Classification Of High-Dimensional Data Based On Multiple Testing Methods, Chong Ma Jan 2018

Classification Of High-Dimensional Data Based On Multiple Testing Methods, Chong Ma

Theses and Dissertations

Supervised and unsupervised classification are common topics in machine learning in both scientific and industrial fields, which usually involve three tasks: prediction, exploration, and explanation. False discovery rate (FDR) theory has a close connection to classical classification theory, which must be employed in a sophisticated way to achieve good performance in various contexts. The study aims to explore novel supervised classifiers and unsupervised classification approaches for functional data and high-dimensional data in genome study by using FDR, respectively. One work develops a novel classifier for functional data by casting the classification problem into a multiple testing task, which involves using …


Discovery Of Community Structures In Static And Dynamic Networks, Shiwen Shen Jan 2018

Discovery Of Community Structures In Static And Dynamic Networks, Shiwen Shen

Theses and Dissertations

With the development of computer technology, researchers are able to observe and collect enormous amount of data, where the independent and identical distributed assumption is violated. For example, in sociology, individuals in an organization interact with each other to change the underlying social structure; in biology, understanding the gene-gene interaction helps researchers to detect potential diseases; in politics, voters are mutually influenced before the election via private/public speeches and parades, which might ultimately change the election results. It is crucial to study how individuals interact with each other from the data, which would lead to tremendous contributions to the society. …


Semiparametric Regression In The Presence Of Measurement Error, Xiang Li Jan 2018

Semiparametric Regression In The Presence Of Measurement Error, Xiang Li

Theses and Dissertations

The error-in-covariates problem has received great attention among researchers who study semiparametric and nonparametric inference for regression models over the past two decades. Without correcting for the measurement error in covariates, estimators for covariate effect usually contain bias. To account for measurement error, much research have been done in mean regression (Liang et al., 1999; Fuller, 2009; Carroll et al., 2006) and quantile regression (He and Liang, 2000; Hardle et al., 2000; Wei and Carroll, 2009). In contrast, there is little research in mode regression and this motivates us to propose semiparametric methods to address this error-incovariates problem in Chapters …


The South Carolina Safety Belt Study: Large-Scale Location Sampling, Stephanie Jones Jan 2018

The South Carolina Safety Belt Study: Large-Scale Location Sampling, Stephanie Jones

Theses and Dissertations

The South Carolina Safety Belt Study is a statewide survey completed yearly to assess the prevalence of safety belt usage on of South Carolina roads through observations from different locations across the state. Every five years the sites for observation are resampled. This thesis breaks down the most recent sampling done for the years of 2018 through 2022. Both the methodology of large scale location sampling and the mathematical idea behind the strategy employed are covered. Further, three different software packages were utilized: R, SAS, and ArcGIS. The steps that were taken and the written function code run for each …


Comparison Of The Performance Of Simple Linear Regression And Quantile Regression With Non-Normal Data: A Simulation Study, Marjorie Howard Jan 2018

Comparison Of The Performance Of Simple Linear Regression And Quantile Regression With Non-Normal Data: A Simulation Study, Marjorie Howard

Theses and Dissertations

Linear regression is a widely used method for analysis that is well understood across a wide variety of disciplines. In order to use linear regression, a number of assumptions must be met. These assumptions, specifically normality and homoscedasticity of the error distribution can at best be met only approximately with real data. Quantile regression requires fewer assumptions, which offers a potential advantage over linear regression. In this simulation study, we compare the performance of linear (least squares) regression to quantile regression when these assumptions are violated, in order to investigate under what conditions quantile regression becomes the more advantageous method …