Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

Missing data

Discipline
Institution
Publication Year
Publication

Articles 1 - 18 of 18

Full-Text Articles in Statistics and Probability

A Multistate Competing Risks Framework For Preconception Prediction Of Pregnancy Outcomes, Kaitlyn Cook, Neil J. Perkins, Enrique Schisterman, Sebastien Haneuse Dec 2022

A Multistate Competing Risks Framework For Preconception Prediction Of Pregnancy Outcomes, Kaitlyn Cook, Neil J. Perkins, Enrique Schisterman, Sebastien Haneuse

Statistical and Data Sciences: Faculty Publications

Background: Preconception pregnancy risk profiles—characterizing the likelihood that a pregnancy attempt results in a full-term birth, preterm birth, clinical pregnancy loss, or failure to conceive—can provide critical information during the early stages of a pregnancy attempt, when obstetricians are best positioned to intervene to improve the chances of successful conception and full-term live birth. Yet the task of constructing and validating risk assessment tools for this earlier intervention window is complicated by several statistical features: the final outcome of the pregnancy attempt is multinomial in nature, and it summarizes the results of two intermediate stages, conception and gestation, whose outcomes …


Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun May 2021

Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun

Mathematical Sciences Technical Reports (MSTR)

Data can be lost for different reasons, but sometimes the missingness is a part of the data collection process. Unbiased and efficient estimation of the parameters governing the response mean model requires the missing data to be appropriately addressed. This paper compares and contrasts the Maximum Likelihood and Inverse Probability Weighting estimators in an Outcome-Dependendent Sampling design that deliberately generates incomplete observations. WE demonstrate the comparison through numerical simulations under varied conditions: different coefficient of determination, and whether or not the mean model is misspecified.


Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova Dec 2019

Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova

College of Education and Human Sciences: Dissertations, Theses, and Student Research

When assessing a certain characteristic or trait using a multiple item measure, quality of that measure can be assessed by examining the reliability. To avoid multiple time points, reliability can be represented by internal consistency, which is most commonly calculated using Cronbach’s coefficient alpha. Almost every time human participants are involved in research, there is missing data involved. Missing data means that even though complete data were expected to be collected, some data are missing. Missing data can follow different patterns as well as be the result of different mechanisms. One traditional way to deal with missing data is listwise …


Fixed Choice Design And Augmented Fixed Choice Design For Network Data With Missing Observations, Miles Q. Ott, Matthew T. Harrison, Krista J. Gile, Nancy P. Barnett, Joseph W. Hogan Jan 2019

Fixed Choice Design And Augmented Fixed Choice Design For Network Data With Missing Observations, Miles Q. Ott, Matthew T. Harrison, Krista J. Gile, Nancy P. Barnett, Joseph W. Hogan

Statistical and Data Sciences: Faculty Publications

The statistical analysis of social networks is increasingly used to understand social processes and patterns. The association between social relationships and individual behaviors is of particular interest to sociologists, psychologists, and public health researchers. Several recent network studies make use of the fixed choice design (FCD), which induces missing edges in the network data. Because of the complex dependence structure inherent in networks, missing data can pose very difficult problems for valid statistical inference. In this article, we introduce novel methods for accounting for the FCD censoring and introduce a new survey design, which we call the augmented fixed choice …


Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane Aug 2017

Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane

Biostatistics Faculty Publications

INTRODUCTION—Findings for genetic correlates of late-onset Alzheimer's disease (LOAD) in studies that rely solely on clinic visits may differ from those with capacity to follow participants unable to attend clinic visits.

METHODS—We evaluated previously identified LOAD-risk single nucleotide variants in the prospective Adult Changes in Thought study, comparing hazard ratios (HRs) estimated using the full data set of both in-home and clinic visits (n = 1697) to HRs estimated using only data that were obtained from clinic visits (n = 1308). Models were adjusted for age, sex, principal components to account for ancestry, and additional health indicators.

RESULTS …


Crtgeedr: An R Package For Doubly Robust Generalized Estimating Equations Estimations In Cluster Randomized Trials With Missing Data, Melanie Prague, Rui Wang, Victor De Gruttola Feb 2016

Crtgeedr: An R Package For Doubly Robust Generalized Estimating Equations Estimations In Cluster Randomized Trials With Missing Data, Melanie Prague, Rui Wang, Victor De Gruttola

Harvard University Biostatistics Working Paper Series

No abstract provided.


Correction Of Verication Bias Using Log-Linear Models For A Single Binaryscale Diagnostic Tests, Haresh Rochani, Hani M. Samawi, Robert L. Vogel, Jingjing Yin Dec 2015

Correction Of Verication Bias Using Log-Linear Models For A Single Binaryscale Diagnostic Tests, Haresh Rochani, Hani M. Samawi, Robert L. Vogel, Jingjing Yin

Biostatistics Faculty Publications

In diagnostic medicine, the test that determines the true disease status without an error is referred to as the gold standard. Even when a gold standard exists, it is extremely difficult to verify each patient due to the issues of costeffectiveness and invasive nature of the procedures. In practice some of the patients with test results are not selected for verification of the disease status which results in verification bias for diagnostic tests. The ability of the diagnostic test to correctly identify the patients with and without the disease can be evaluated by measures such as sensitivity, specificity and predictive …


Integrating Data Transformation In Principal Components Analysis, Mehdi Maadooliat, Jianhua Z. Huang, Jianhua Hu Mar 2015

Integrating Data Transformation In Principal Components Analysis, Mehdi Maadooliat, Jianhua Z. Huang, Jianhua Hu

Mathematics, Statistics and Computer Science Faculty Research and Publications

Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated …


Phylogenetic Linkage Among Hiv-Infected Village Residents In Botswana: Estimation Of Clustering Rates In The Presence Of Missing Data, Nicole Bohme Carnegie, Rui Wang, Vladimir Novitsky, Victor G. Degruttola Jun 2013

Phylogenetic Linkage Among Hiv-Infected Village Residents In Botswana: Estimation Of Clustering Rates In The Presence Of Missing Data, Nicole Bohme Carnegie, Rui Wang, Vladimir Novitsky, Victor G. Degruttola

Harvard University Biostatistics Working Paper Series

No abstract provided.


Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page Feb 2013

Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page

U.C. Berkeley Division of Biostatistics Working Paper Series

In most experimental and observational studies, participants are not followed in continuous time. Instead, data is collected about participants only at certain monitoring times. These monitoring times are random, and often participant specific. As a result, outcomes are only known up to random time intervals, resulting in interval-censored data. In contrast, when estimating variable importance measures on interval-censored outcomes, practitioners often ignore the presence of interval-censoring, and instead treat the data as continuous or right-censored, applying ad-hoc approaches to mask the true interval-censoring. In this paper, we describe Targeted Minimum Loss-based Estimation methods tailored for estimation of variable importance measures …


In Praise Of Simplicity Not Mathematistry! Ten Simple Powerful Ideas For The Statistical Scientist, Roderick J. Little Jan 2013

In Praise Of Simplicity Not Mathematistry! Ten Simple Powerful Ideas For The Statistical Scientist, Roderick J. Little

The University of Michigan Department of Biostatistics Working Paper Series

Ronald Fisher was by all accounts a first-rate mathematician, but he saw himself as a scientist, not a mathematician, and he railed against what George Box called (in his Fisher lecture) "mathematistry". Mathematics is the indispensable foundation for statistics, but our subject is constantly under assault by people who want to turn statistics into a branch of mathematics, making the subject as impenetrable to non-mathematicians as possible. Valuing simplicity, I describe ten simple and powerful ideas that have influenced my thinking about statistics, in my areas of research interest: missing data, causal inference, survey sampling, and statistical modeling in general. …


A Cautionary Note On Generalized Linear Models For Covariance Of Unbalanced Longitudinal Data, Jianhua Z. Huang, Min Chen, Mehdi Maadooliat, Mohsen Pourahmadi Mar 2012

A Cautionary Note On Generalized Linear Models For Covariance Of Unbalanced Longitudinal Data, Jianhua Z. Huang, Min Chen, Mehdi Maadooliat, Mohsen Pourahmadi

Mathematics, Statistics and Computer Science Faculty Research and Publications

Missing data in longitudinal studies can create enormous challenges in data analysis when coupled with the positive-definiteness constraint on a covariance matrix. For complete balanced data, the Cholesky decomposition of a covariance matrix makes it possible to remove the positive-definiteness constraint and use a generalized linear model setup to jointly model the mean and covariance using covariates (Pourahmadi, 2000). However, this approach may not be directly applicable when the longitudinal data are unbalanced, as coherent regression models for the dependence across all times and subjects may not exist. Within the existing generalized linear model framework, we show how to overcome …


Proxy Pattern-Mixture Analysis For A Binary Variable Subject To Nonresponse., Rebecca H. Andridge, Roderick J. Little Nov 2011

Proxy Pattern-Mixture Analysis For A Binary Variable Subject To Nonresponse., Rebecca H. Andridge, Roderick J. Little

The University of Michigan Department of Biostatistics Working Paper Series

We consider assessment of the impact of nonresponse for a binary survey

variable Y subject to nonresponse, when there is a set of covariates

observed for nonrespondents and respondents. To reduce dimensionality and

for simplicity we reduce the covariates to a continuous proxy variable X

that has the highest correlation with Y, estimated from a probit

regression analysis of respondent data. We extend our previously proposed

proxy-pattern mixture analysis (PPMA) for continuous outcomes to the binary

outcome using a latent variable approach. The method does not assume data

are missing at random, and creates a framework for sensitivity analyses.

Maximum …


Multiple Imputation For The Comparison Of Two Screening Tests In Two-Phase Alzheimer Studies, Ofer Harel, Xiao-Hua Zhou Sep 2006

Multiple Imputation For The Comparison Of Two Screening Tests In Two-Phase Alzheimer Studies, Ofer Harel, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

Two-phase designs are common in epidemiological studies of dementia, and especially in Alzheimer research. In the first phase, all subjects are screened using a common screening test(s), while in the second phase, only a subset of these subjects is tested using a more definitive verification assessment, i.e. golden standard test. When comparing the accuracy of two screening tests in a two-phase study of dementia, inferences are commonly made using only the verified sample. It is well documented that in that case, there is a risk for bias, called verification bias. When the two screening tests have only two values (e.g. …


Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou May 2005

Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

In the case in which all subjects are screened using a common test, and only a subset of these subjects are tested using a golden standard test, it is well documented that there is a risk for bias, called verification bias. When the test has only two levels (e.g. positive and negative) and we are trying to estimate the sensitivity and specificity of the test, one is actually constructing a confidence interval for a binomial proportion. Since it is well documented that this estimation is not trivial even with complete data, we adopt Multiple imputation (MI) framework for verification bias …


Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou Jul 2004

Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou

UW Biostatistics Working Paper Series

In evaluation of diagnostic accuracy of tests, a gold standard on the disease status is required. However, in many complex diseases, it is impossible or unethical to obtain such the gold standard. If an imperfect standard is used as if it were a gold standard, the estimated accuracy of the tests would be biased. This type of bias is called imperfect gold standard bias. In this paper we develop a maximum likelihood (ML) method for estimating ROC curves and their areas of ordinal-scale tests in the absence of a gold standard. Our simulation study shows the proposed estimates for the …


Does Weighting For Nonresponse Increase The Variance Of Survey Means?, Rod Little, Sonya L. Vartivarian Apr 2004

Does Weighting For Nonresponse Increase The Variance Of Survey Means?, Rod Little, Sonya L. Vartivarian

The University of Michigan Department of Biostatistics Working Paper Series

Nonresponse weighting is a common method for handling unit nonresponse in surveys. A widespread view is that the weighting method is aimed at reducing nonresponse bias, at the expense of an increase in variance. Hence, the efficacy of weighting adjustments becomes a bias-variance trade-off. This note suggests that this view is an oversimplification -- nonresponse weighting can in fact lead to a reduction in variance as well as bias. A covariate for a weighting adjustment must have two characteristics to reduce nonresponse bias - it needs to be related to the probability of response, and it needs to be related …


Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman May 2003

Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman

The University of Michigan Department of Biostatistics Working Paper Series

The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the …