Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Biostatistics (9)
- Mathematics (4)
- Social and Behavioral Sciences (4)
- Statistical Methodology (4)
- Design of Experiments and Sample Surveys (3)
-
- Medicine and Health Sciences (3)
- Applied Mathematics (2)
- Computer Sciences (2)
- Public Health (2)
- Statistical Models (2)
- Statistical Theory (2)
- Categorical Data Analysis (1)
- Clinical Epidemiology (1)
- Clinical Trials (1)
- Community Health (1)
- Data Science (1)
- Demography, Population, and Ecology (1)
- Diseases (1)
- Education (1)
- Educational Assessment, Evaluation, and Research (1)
- Genetics and Genomics (1)
- Life Sciences (1)
- Longitudinal Data Analysis and Time Series (1)
- Mental and Social Health (1)
- Numerical Analysis and Computation (1)
- Other Mathematics (1)
- Psychology (1)
- Quantitative Psychology (1)
- Institution
- Publication Year
- Publication
-
- The University of Michigan Department of Biostatistics Working Paper Series (4)
- UW Biostatistics Working Paper Series (3)
- Biostatistics Faculty Publications (2)
- Harvard University Biostatistics Working Paper Series (2)
- Mathematics, Statistics and Computer Science Faculty Research and Publications (2)
Articles 1 - 18 of 18
Full-Text Articles in Statistics and Probability
A Multistate Competing Risks Framework For Preconception Prediction Of Pregnancy Outcomes, Kaitlyn Cook, Neil J. Perkins, Enrique Schisterman, Sebastien Haneuse
A Multistate Competing Risks Framework For Preconception Prediction Of Pregnancy Outcomes, Kaitlyn Cook, Neil J. Perkins, Enrique Schisterman, Sebastien Haneuse
Statistical and Data Sciences: Faculty Publications
Background: Preconception pregnancy risk profiles—characterizing the likelihood that a pregnancy attempt results in a full-term birth, preterm birth, clinical pregnancy loss, or failure to conceive—can provide critical information during the early stages of a pregnancy attempt, when obstetricians are best positioned to intervene to improve the chances of successful conception and full-term live birth. Yet the task of constructing and validating risk assessment tools for this earlier intervention window is complicated by several statistical features: the final outcome of the pregnancy attempt is multinomial in nature, and it summarizes the results of two intermediate stages, conception and gestation, whose outcomes …
Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun
Mathematical Sciences Technical Reports (MSTR)
Data can be lost for different reasons, but sometimes the missingness is a part of the data collection process. Unbiased and efficient estimation of the parameters governing the response mean model requires the missing data to be appropriately addressed. This paper compares and contrasts the Maximum Likelihood and Inverse Probability Weighting estimators in an Outcome-Dependendent Sampling design that deliberately generates incomplete observations. WE demonstrate the comparison through numerical simulations under varied conditions: different coefficient of determination, and whether or not the mean model is misspecified.
Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova
Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova
College of Education and Human Sciences: Dissertations, Theses, and Student Research
When assessing a certain characteristic or trait using a multiple item measure, quality of that measure can be assessed by examining the reliability. To avoid multiple time points, reliability can be represented by internal consistency, which is most commonly calculated using Cronbach’s coefficient alpha. Almost every time human participants are involved in research, there is missing data involved. Missing data means that even though complete data were expected to be collected, some data are missing. Missing data can follow different patterns as well as be the result of different mechanisms. One traditional way to deal with missing data is listwise …
Fixed Choice Design And Augmented Fixed Choice Design For Network Data With Missing Observations, Miles Q. Ott, Matthew T. Harrison, Krista J. Gile, Nancy P. Barnett, Joseph W. Hogan
Fixed Choice Design And Augmented Fixed Choice Design For Network Data With Missing Observations, Miles Q. Ott, Matthew T. Harrison, Krista J. Gile, Nancy P. Barnett, Joseph W. Hogan
Statistical and Data Sciences: Faculty Publications
The statistical analysis of social networks is increasingly used to understand social processes and patterns. The association between social relationships and individual behaviors is of particular interest to sociologists, psychologists, and public health researchers. Several recent network studies make use of the fixed choice design (FCD), which induces missing edges in the network data. Because of the complex dependence structure inherent in networks, missing data can pose very difficult problems for valid statistical inference. In this article, we introduce novel methods for accounting for the FCD censoring and introduce a new survey design, which we call the augmented fixed choice …
Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane
Impact Of Home Visit Capacity On Genetic Association Studies Of Late-Onset Alzheimer's Disease, David W. Fardo, Laura E. Gibbons, Shubhabrata Mukherjee, M. Maria Glymour, Wayne Mccormick, Susan M. Mccurry, James D. Bowen, Eric B. Larson, Paul K. Crane
Biostatistics Faculty Publications
INTRODUCTION—Findings for genetic correlates of late-onset Alzheimer's disease (LOAD) in studies that rely solely on clinic visits may differ from those with capacity to follow participants unable to attend clinic visits.
METHODS—We evaluated previously identified LOAD-risk single nucleotide variants in the prospective Adult Changes in Thought study, comparing hazard ratios (HRs) estimated using the full data set of both in-home and clinic visits (n = 1697) to HRs estimated using only data that were obtained from clinic visits (n = 1308). Models were adjusted for age, sex, principal components to account for ancestry, and additional health indicators.
RESULTS …
Crtgeedr: An R Package For Doubly Robust Generalized Estimating Equations Estimations In Cluster Randomized Trials With Missing Data, Melanie Prague, Rui Wang, Victor De Gruttola
Crtgeedr: An R Package For Doubly Robust Generalized Estimating Equations Estimations In Cluster Randomized Trials With Missing Data, Melanie Prague, Rui Wang, Victor De Gruttola
Harvard University Biostatistics Working Paper Series
No abstract provided.
Correction Of Verication Bias Using Log-Linear Models For A Single Binaryscale Diagnostic Tests, Haresh Rochani, Hani M. Samawi, Robert L. Vogel, Jingjing Yin
Correction Of Verication Bias Using Log-Linear Models For A Single Binaryscale Diagnostic Tests, Haresh Rochani, Hani M. Samawi, Robert L. Vogel, Jingjing Yin
Biostatistics Faculty Publications
In diagnostic medicine, the test that determines the true disease status without an error is referred to as the gold standard. Even when a gold standard exists, it is extremely difficult to verify each patient due to the issues of costeffectiveness and invasive nature of the procedures. In practice some of the patients with test results are not selected for verification of the disease status which results in verification bias for diagnostic tests. The ability of the diagnostic test to correctly identify the patients with and without the disease can be evaluated by measures such as sensitivity, specificity and predictive …
Integrating Data Transformation In Principal Components Analysis, Mehdi Maadooliat, Jianhua Z. Huang, Jianhua Hu
Integrating Data Transformation In Principal Components Analysis, Mehdi Maadooliat, Jianhua Z. Huang, Jianhua Hu
Mathematics, Statistics and Computer Science Faculty Research and Publications
Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated …
Phylogenetic Linkage Among Hiv-Infected Village Residents In Botswana: Estimation Of Clustering Rates In The Presence Of Missing Data, Nicole Bohme Carnegie, Rui Wang, Vladimir Novitsky, Victor G. Degruttola
Phylogenetic Linkage Among Hiv-Infected Village Residents In Botswana: Estimation Of Clustering Rates In The Presence Of Missing Data, Nicole Bohme Carnegie, Rui Wang, Vladimir Novitsky, Victor G. Degruttola
Harvard University Biostatistics Working Paper Series
No abstract provided.
Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page
Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page
U.C. Berkeley Division of Biostatistics Working Paper Series
In most experimental and observational studies, participants are not followed in continuous time. Instead, data is collected about participants only at certain monitoring times. These monitoring times are random, and often participant specific. As a result, outcomes are only known up to random time intervals, resulting in interval-censored data. In contrast, when estimating variable importance measures on interval-censored outcomes, practitioners often ignore the presence of interval-censoring, and instead treat the data as continuous or right-censored, applying ad-hoc approaches to mask the true interval-censoring. In this paper, we describe Targeted Minimum Loss-based Estimation methods tailored for estimation of variable importance measures …
In Praise Of Simplicity Not Mathematistry! Ten Simple Powerful Ideas For The Statistical Scientist, Roderick J. Little
In Praise Of Simplicity Not Mathematistry! Ten Simple Powerful Ideas For The Statistical Scientist, Roderick J. Little
The University of Michigan Department of Biostatistics Working Paper Series
Ronald Fisher was by all accounts a first-rate mathematician, but he saw himself as a scientist, not a mathematician, and he railed against what George Box called (in his Fisher lecture) "mathematistry". Mathematics is the indispensable foundation for statistics, but our subject is constantly under assault by people who want to turn statistics into a branch of mathematics, making the subject as impenetrable to non-mathematicians as possible. Valuing simplicity, I describe ten simple and powerful ideas that have influenced my thinking about statistics, in my areas of research interest: missing data, causal inference, survey sampling, and statistical modeling in general. …
A Cautionary Note On Generalized Linear Models For Covariance Of Unbalanced Longitudinal Data, Jianhua Z. Huang, Min Chen, Mehdi Maadooliat, Mohsen Pourahmadi
A Cautionary Note On Generalized Linear Models For Covariance Of Unbalanced Longitudinal Data, Jianhua Z. Huang, Min Chen, Mehdi Maadooliat, Mohsen Pourahmadi
Mathematics, Statistics and Computer Science Faculty Research and Publications
Missing data in longitudinal studies can create enormous challenges in data analysis when coupled with the positive-definiteness constraint on a covariance matrix. For complete balanced data, the Cholesky decomposition of a covariance matrix makes it possible to remove the positive-definiteness constraint and use a generalized linear model setup to jointly model the mean and covariance using covariates (Pourahmadi, 2000). However, this approach may not be directly applicable when the longitudinal data are unbalanced, as coherent regression models for the dependence across all times and subjects may not exist. Within the existing generalized linear model framework, we show how to overcome …
Proxy Pattern-Mixture Analysis For A Binary Variable Subject To Nonresponse., Rebecca H. Andridge, Roderick J. Little
Proxy Pattern-Mixture Analysis For A Binary Variable Subject To Nonresponse., Rebecca H. Andridge, Roderick J. Little
The University of Michigan Department of Biostatistics Working Paper Series
We consider assessment of the impact of nonresponse for a binary survey
variable Y subject to nonresponse, when there is a set of covariates
observed for nonrespondents and respondents. To reduce dimensionality and
for simplicity we reduce the covariates to a continuous proxy variable X
that has the highest correlation with Y, estimated from a probit
regression analysis of respondent data. We extend our previously proposed
proxy-pattern mixture analysis (PPMA) for continuous outcomes to the binary
outcome using a latent variable approach. The method does not assume data
are missing at random, and creates a framework for sensitivity analyses.
Maximum …
Multiple Imputation For The Comparison Of Two Screening Tests In Two-Phase Alzheimer Studies, Ofer Harel, Xiao-Hua Zhou
Multiple Imputation For The Comparison Of Two Screening Tests In Two-Phase Alzheimer Studies, Ofer Harel, Xiao-Hua Zhou
UW Biostatistics Working Paper Series
Two-phase designs are common in epidemiological studies of dementia, and especially in Alzheimer research. In the first phase, all subjects are screened using a common screening test(s), while in the second phase, only a subset of these subjects is tested using a more definitive verification assessment, i.e. golden standard test. When comparing the accuracy of two screening tests in a two-phase study of dementia, inferences are commonly made using only the verified sample. It is well documented that in that case, there is a risk for bias, called verification bias. When the two screening tests have only two values (e.g. …
Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou
Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou
UW Biostatistics Working Paper Series
In the case in which all subjects are screened using a common test, and only a subset of these subjects are tested using a golden standard test, it is well documented that there is a risk for bias, called verification bias. When the test has only two levels (e.g. positive and negative) and we are trying to estimate the sensitivity and specificity of the test, one is actually constructing a confidence interval for a binomial proportion. Since it is well documented that this estimation is not trivial even with complete data, we adopt Multiple imputation (MI) framework for verification bias …
Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou
Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou
UW Biostatistics Working Paper Series
In evaluation of diagnostic accuracy of tests, a gold standard on the disease status is required. However, in many complex diseases, it is impossible or unethical to obtain such the gold standard. If an imperfect standard is used as if it were a gold standard, the estimated accuracy of the tests would be biased. This type of bias is called imperfect gold standard bias. In this paper we develop a maximum likelihood (ML) method for estimating ROC curves and their areas of ordinal-scale tests in the absence of a gold standard. Our simulation study shows the proposed estimates for the …
Does Weighting For Nonresponse Increase The Variance Of Survey Means?, Rod Little, Sonya L. Vartivarian
Does Weighting For Nonresponse Increase The Variance Of Survey Means?, Rod Little, Sonya L. Vartivarian
The University of Michigan Department of Biostatistics Working Paper Series
Nonresponse weighting is a common method for handling unit nonresponse in surveys. A widespread view is that the weighting method is aimed at reducing nonresponse bias, at the expense of an increase in variance. Hence, the efficacy of weighting adjustments becomes a bias-variance trade-off. This note suggests that this view is an oversimplification -- nonresponse weighting can in fact lead to a reduction in variance as well as bias. A covariate for a weighting adjustment must have two characteristics to reduce nonresponse bias - it needs to be related to the probability of response, and it needs to be related …
Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman
Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman
The University of Michigan Department of Biostatistics Working Paper Series
The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the …