Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics

East Tennessee State University

Theses/Dissertations

Missing data

Articles 1 - 6 of 6

Full-Text Articles in Entire DC Network

Performance Comparison Of Imputation Methods For Mixed Data Missing At Random With Small And Large Sample Data Set With Different Variability, Kyei Afari Aug 2021

Performance Comparison Of Imputation Methods For Mixed Data Missing At Random With Small And Large Sample Data Set With Different Variability, Kyei Afari

Electronic Theses and Dissertations

One of the concerns in the field of statistics is the presence of missing data, which leads to bias in parameter estimation and inaccurate results. However, the multiple imputation procedure is a remedy for handling missing data. This study looked at the best multiple imputation methods used to handle mixed variable datasets with different sample sizes and variability along with different levels of missingness. The study employed the predictive mean matching, classification and regression trees, and the random forest imputation methods. For each dataset, the multiple regression parameter estimates for the complete datasets were compared to the multiple regression parameter …


Performance Comparison Of Multiple Imputation Methods For Quantitative Variables For Small And Large Data With Differing Variability, Vincent Onyame May 2021

Performance Comparison Of Multiple Imputation Methods For Quantitative Variables For Small And Large Data With Differing Variability, Vincent Onyame

Electronic Theses and Dissertations

Missing data continues to be one of the main problems in data analysis as it reduces sample representativeness and consequently, causes biased estimates. Multiple imputation methods have been established as an effective method of handling missing data. In this study, we examined multiple imputation methods for quantitative variables on twelve data sets with varied sizes and variability that were pseudo generated from an original data. The multiple imputation methods examined are the predictive mean matching, Bayesian linear regression and linear regression, non-Bayesian in the MICE (Multiple Imputation Chain Equation) package in the statistical software, R. The parameter estimates generated from …


Investigation Of Multiple Imputation Methods For Categorical Variables, Samantha Miranda May 2020

Investigation Of Multiple Imputation Methods For Categorical Variables, Samantha Miranda

Electronic Theses and Dissertations

We compare different multiple imputation methods for categorical variables using the MICE package in R. We take a complete data set and remove different levels of missingness and evaluate the imputation methods for each level of missingness. Logistic regression imputation and linear discriminant analysis (LDA) are used for binary variables. Multinomial logit imputation and LDA are used for nominal variables while ordered logit imputation and LDA are used for ordinal variables. After imputation, the regression coefficients, percent deviation index (PDI) values, and relative frequency tables were found for each imputed data set for each level of missingness and compared to …


Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt May 2019

Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt

Electronic Theses and Dissertations

A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The …


Performance Comparison Of Imputation Algorithms On Missing At Random Data, Evans Dapaa Addo May 2018

Performance Comparison Of Imputation Algorithms On Missing At Random Data, Evans Dapaa Addo

Electronic Theses and Dissertations

Missing data continues to be an issue not only the field of statistics but in any field, that deals with data. This is due to the fact that almost all the widely accepted and standard statistical software and methods assume complete data for all the variables included in the analysis. As a result, in most studies, statistical power is weakened and parameter estimates are biased, leading to weak conclusions and generalizations.

Many studies have established that multiple imputation methods are effective ways of handling missing data. This paper examines three different imputation methods (predictive mean matching, Bayesian linear regression and …


Using The Em Algorithm To Estimate The Difference In Dependent Proportions In A 2 X 2 Table With Missing Data., Alain Duclaux Talla Souop Aug 2004

Using The Em Algorithm To Estimate The Difference In Dependent Proportions In A 2 X 2 Table With Missing Data., Alain Duclaux Talla Souop

Electronic Theses and Dissertations

In this thesis, I am interested in estimating the difference between dependent proportions from a 2 × 2 contingency table when there are missing data. The Expectation-Maximization (EM) algorithm is used to obtain an estimate for the difference between correlated proportions. To obtain the standard error of this difference I employ a resampling technique known as bootstrapping. The performance of the bootstrap standard error is evaluated for different sample sizes and different fractions of missing information. Finally, a 100(1-α)% bootstrap confidence interval is proposed and its coverage is evaluated through simulation.