Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

2003

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 54

Full-Text Articles in Statistical Models

Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin Dec 2003

Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin

The University of Michigan Department of Biostatistics Working Paper Series

The model-based approach to inference from multivariate data with missing values is reviewed. Regression prediction is most useful when the covariates are predictive of the missing values and the probability of being missing, and in these circumstances predictions are particularly sensitive to model misspecification. The use of penalized splines of the propensity score is proposed to yield robust model-based inference under the missing at random (MAR) assumption, assuming monotone missing data. Simulation comparisons with other methods suggest that the method works well in a wide range of populations, with little loss of efficiency relative to parametric models when the latter …


Uncertainty And The Value Of Diagnostic Information With Application To Axillary Lymph Node Dissection In Breast Cancer, Giovanni Parmigiani Dec 2003

Uncertainty And The Value Of Diagnostic Information With Application To Axillary Lymph Node Dissection In Breast Cancer, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

In clinical decision making, it is common to ask whether, and how much, a diagnostic procedure is contributing to subsequent treatment decisions. Statistically, quantification of the value of the information provided by a diagnostic procedure can be carried out using decision trees with multiple decision points, representing both the diagnostic test and the subsequent treatments that may depend on the test's results. This article investigates probabilistic sensitivity analysis approaches for exploring and communicating parameter uncertainty in such decision trees. Complexities arise because uncertainty about a model's inputs determines uncertainty about optimal decisions at all decision nodes of a tree. We …


Marginalized Transition Models For Longitudinal Binary Data With Ignorable And Nonignorable Dropout, Brenda F. Kurland, Patrick J. Heagerty Dec 2003

Marginalized Transition Models For Longitudinal Binary Data With Ignorable And Nonignorable Dropout, Brenda F. Kurland, Patrick J. Heagerty

UW Biostatistics Working Paper Series

We extend the marginalized transition model of Heagerty (2002) to accommodate nonignorable monotone dropout. Using a selection model, weakly identified dropout parameters are held constant and their effects evaluated through sensitivity analysis. For data missing at random (MAR), efficiency of inverse probability of censoring weighted generalized estimating equations (IPCW-GEE) is as low as 40% compared to a likelihood-based marginalized transition model (MTM) with comparable modeling burden. MTM and IPCW-GEE regression parameters both display misspecification bias for MAR and nonignorable missing data, and both reduce bias noticeably by improving model fit


Marginal Modeling Of Multilevel Binary Data With Time-Varying Covariates, Diana Miglioretti, Patrick Heagerty Dec 2003

Marginal Modeling Of Multilevel Binary Data With Time-Varying Covariates, Diana Miglioretti, Patrick Heagerty

UW Biostatistics Working Paper Series

We propose and compare two approaches for regression analysis of multilevel binary data when clusters are not necessarily nested: a GEE method that relies on a working independence assumption coupled with a three-step method for obtaining empirical standard errors; and a likelihood-based method implemented using Bayesian computational techniques. Implications of time-varying endogenous covariates are addressed. The methods are illustrated using data from the Breast Cancer Surveillance Consortium to estimate mammography accuracy from a repeatedly screened population.


Survival Model Predictive Accuracy And Roc Curves, Patrick Heagerty, Yingye Zheng Dec 2003

Survival Model Predictive Accuracy And Roc Curves, Patrick Heagerty, Yingye Zheng

UW Biostatistics Working Paper Series

The predictive accuracy of a survival model can be summarized using extensions of the proportion of variation explained by the model, or R^2, commonly used for continuous response models, or using extensions of sensitivity and specificity which are commonly used for binary response models.

In this manuscript we propose new time-dependent accuracy summaries based on time-specific versions of sensitivity and specificity calculated over risk sets. We connect the accuracy summaries to a previously proposed global concordance measure which is a variant of Kendall's tau. In addition, we show how standard Cox regression output can be used to obtain estimates of …


Partly Conditional Survival Models For Longitudinal Data, Yingye Zheng, Patrick Heagerty Dec 2003

Partly Conditional Survival Models For Longitudinal Data, Yingye Zheng, Patrick Heagerty

UW Biostatistics Working Paper Series

It is common in longitudinal studies to collect information on the time until a key clinical event, such as death, and to measure markers of patient health at multiple follow-up times. One approach to the joint analysis of survival and repeated measures data adopts a time-varying covariate regression model for the event time hazard. Using this standard approach the instantaneous risk of death at time t is specified as a possibly semi-parametric function of covariate information that has accrued through time t. In this manuscript we decouple the time scale for modeling the hazard from the time scale for accrual …


Semiparametric Estimation Of Time-Dependent: Roc Curves For Longitudinal Marker Data, Yingye Zheng, Patrick Heagerty Dec 2003

Semiparametric Estimation Of Time-Dependent: Roc Curves For Longitudinal Marker Data, Yingye Zheng, Patrick Heagerty

UW Biostatistics Working Paper Series

One approach to evaluating the strength of association between a longitudinal marker process and a key clinical event time is through predictive regression methods such as a time-dependent covariate hazard model. For example, a time-varying covariate Cox model specifies the instantaneous risk of the event as a function of the time-varying marker and additional covariates. In this manuscript we explore a second complementary approach which characterizes the distribution of the marker as a function of both the measurement time and the ultimate event time. Our goal is to flexibly extend the standard diagnostic accuracy concepts of sensitivity and specificity to …


Comparison Of The Inverse Probability Of Treatment Weighted (Iptw) Estimator With A Naïve Estimator In The Analysis Of Longitudinal Data With Time-Dependent Confounding: A Simulation Study, Thaddeus Haight, Romain Neugebauer, Ira B. Tager, Mark J. Van Der Laan Dec 2003

Comparison Of The Inverse Probability Of Treatment Weighted (Iptw) Estimator With A Naïve Estimator In The Analysis Of Longitudinal Data With Time-Dependent Confounding: A Simulation Study, Thaddeus Haight, Romain Neugebauer, Ira B. Tager, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A simulation study was conducted to compare estimates from a naïve estimator, using standard conditional regression, and an IPTW (Inverse Probability of Treatment Weighted) estimator, to true causal parameters for a given MSM (Marginal Structural Model). The study was extracted from a larger epidemiological study (Longitudinal Study of Effects of Physical Activity and Body Composition on Functional Limitation in the Elderly, by Tager et. al [accepted, Epidemiology, September 2003]), which examined the causal effects of physical activity and body composition on functional limitation. The simulation emulated the larger study in terms of the exposure and outcome variables of interest-- physical …


Optimization Of Breast Cancer Screening Modalities, Yu Shen, Giovanni Parmigiani Dec 2003

Optimization Of Breast Cancer Screening Modalities, Yu Shen, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

Mathematical models and decision analyses based on microsimulations have been shown to be useful in evaluating relative merits of various screening strategies in terms of cost and mortality reduction. Most investigations regarding the balance between mortality reduction and costs have focused on a single modality, mammography. A systematic evaluation of the relative expenses and projected benefit of combining clinical breast examination and mammography is not at present available. The purpose of this report is to provide methodologic details including assumptions and data used in the process of modeling for complex decision analyses, when searching for optimal breast cancer screening strategies …


Modeling The Incubation Period Of Anthrax, Ron Brookmeyer, Elizabeth Johnson, Sarah Barry Dec 2003

Modeling The Incubation Period Of Anthrax, Ron Brookmeyer, Elizabeth Johnson, Sarah Barry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Models of the incubation period of anthrax are important to public health planners because they can be used to predict the delay before outbreaks are detected, the size of an outbreak and the duration of time that persons should remain on antibiotics to prevent disease. The difficulty is that there is little direct data about the incubation period in humans. The objective of this paper is to develop and apply models for the incubation period of anthrax. Mechanistic models that account for the biology of spore clearance and germination are developed based on a competing risks formulation. The models predict …


Unified Cross-Validation Methodology For Selection Among Estimators And A General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities And Examples, Mark J. Van Der Laan, Sandrine Dudoit Nov 2003

Unified Cross-Validation Methodology For Selection Among Estimators And A General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities And Examples, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

In Part I of this article we propose a general cross-validation criterian for selecting among a collection of estimators of a particular parameter of interest based on n i.i.d. observations. It is assumed that the parameter of interest minimizes the expectation (w.r.t. to the distribution of the observed data structure) of a particular loss function of a candidate parameter value and the observed data structure, possibly indexed by a nuisance parameter. The proposed cross-validation criterian is defined as the empirical mean over the validation sample of the loss function at the parameter estimate based on the training sample, averaged over …


Underestimation Of Standard Errors In Multi-Site Time Series Studies, Michael Daniels, Francesca Dominici, Scott L. Zeger Nov 2003

Underestimation Of Standard Errors In Multi-Site Time Series Studies, Michael Daniels, Francesca Dominici, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Multi-site time series studies of air pollution and mortality and morbidity have figured prominently in the literature as comprehensive approaches for estimating acute effects of air pollution on health. Hierarchical models are generally used to combine site-specific information and estimate pooled air pollution effects taking into account both within-site statistical uncertainty, and across-site heterogeneity.

Within a site, characteristics of time series data of air pollution and health (small pollution effects, missing data, highly correlated predictors, non linear confounding etc.) make modelling all sources of uncertainty challenging. One potential consequence is underestimation of the statistical variance of the site-specific effects to …


Estimating Predictors For Long- Or Short-Term Survivors, Lu Tian, Wei Wang, L. J. Wei Nov 2003

Estimating Predictors For Long- Or Short-Term Survivors, Lu Tian, Wei Wang, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Time-Series Studies Of Particulate Matter, Michelle L. Bell, Jonathan M. Samet, Francesca Dominici Nov 2003

Time-Series Studies Of Particulate Matter, Michelle L. Bell, Jonathan M. Samet, Francesca Dominici

Johns Hopkins University, Dept. of Biostatistics Working Papers

Studies of air pollution and human health have evolved from descriptive studies of the early phenomena of large increases in adverse health effects following extreme air pollution episodes, to time-series analyses and the development of sophisticated regression models. In fact, advanced statistical methods are necessary to address the many challenges inherent in the detection of a small pollution risk in the presence of many confounders. This paper reviews the history, methods, and findings of the time-series studies estimating health risks associated with short-term exposure to particulate matter, though much of the discussion is applicable to epidemiological studies of air pollution …


Smooth Quantile Ratio Estimation With Regression: Estimating Medical Expenditures For Smoking Attributable Diseases, Francesca Dominici, Scott L. Zeger Nov 2003

Smooth Quantile Ratio Estimation With Regression: Estimating Medical Expenditures For Smoking Attributable Diseases, Francesca Dominici, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this paper we introduce a semi-parametric regression model for estimating the difference in the expected value of two positive and highly skewed random variables as a function of covariates. Our method extends Smooth Quantile Ratio Estimation (SQUARE), a novel estimator of the mean difference of two positive random variables, to a regression model.

The methodological development of this paper is motivated by a common problem in econometrics where we are interested in estimating the difference in the average expenditures between two populations, say with and without a disease, taking covariates into account. Let Y1 and Y2 be two positive …


A Corrected Pseudo-Score Approach For Additive Hazards Model With Longitudinal Covariates Measured With Error, Xiao Song, Yijian Huang Nov 2003

A Corrected Pseudo-Score Approach For Additive Hazards Model With Longitudinal Covariates Measured With Error, Xiao Song, Yijian Huang

UW Biostatistics Working Paper Series

In medical studies, it is often of interest to characterize the relationship between a time-to-event and covariates, not only time-independent but also time-dependent. Time-dependent covariates are generally measured intermittently and with error. Recent interests focus on the proportional hazards framework, with longitudinal data jointly modeled through a mixed effects model. However, approaches under this framework depend on the normality assumption of the error, and might encounter intractable numerical difficulties in practice. This motivates us to consider an alternative framework, that is, the additive hazards model, under which little has been done when time-dependent covariates are measured with error. We propose …


A Nonparametric Comparison Of Conditional Distributions With Nonnegligible Cure Fractions, Yi Li, Jin Feng Nov 2003

A Nonparametric Comparison Of Conditional Distributions With Nonnegligible Cure Fractions, Yi Li, Jin Feng

Harvard University Biostatistics Working Paper Series

No abstract provided.


Survival Analysis With Heterogeneous Covariate Measurement Error, Yi Li, Louise Ryan Nov 2003

Survival Analysis With Heterogeneous Covariate Measurement Error, Yi Li, Louise Ryan

Harvard University Biostatistics Working Paper Series

No abstract provided.


To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little Nov 2003

To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Finite population sampling is perhaps the only area of statistics where the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design and model-based inference. The basic features of the two approaches are illustrated using the case of inference about the mean from stratified random samples. Strengths and weakness of design-based and model-based inference for surveys are discussed. It is suggested that models that take into account the sample design and make weak parametric assumptions can produce reliable and efficient inferences in surveys settings. …


Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway Nov 2003

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Johns Hopkins University, Dept. of Biostatistics Working Papers

Several authors have studied the performance of optimal, squared error loss (SEL) estimated ranks. Though these are effective, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers. We construct loss functions that address this goal and evaluate candidate rank estimates, some of which optimize specific loss functions. We study performance for a fully parametric hierarchical model with a Gaussian prior and Gaussian sampling distributions, evaluating performance for several loss functions. Results show that though SEL-optimal ranks and percentiles do not specifically focus on classifying with respect to a percentile cut …


Joint Modeling And Estimation For Recurrent Event Processes And Failure Time Data, Chiung-Yu Huang, Mei-Cheng Wang Nov 2003

Joint Modeling And Estimation For Recurrent Event Processes And Failure Time Data, Chiung-Yu Huang, Mei-Cheng Wang

Johns Hopkins University, Dept. of Biostatistics Working Papers

Recurrent event data are commonly encountered in longitudinal follow-up studies related to biomedical science, econometrics, reliability, and demography. In many studies, recurrent events serve as important measurements for evaluating disease progression, health deterioration, or insurance risk. When analyzing recurrent event data, an independent censoring condition is typically required for the construction of statistical methods. Nevertheless, in some situations, the terminating time for observing recurrent events could be correlated with the recurrent event process and, as a result, the assumption of independent censoring is violated. In this paper, we consider joint modeling of a recurrent event process and a failure time …


Semi-Parametric Box-Cox Power Transformation Models For Censored Survival Observations, Tianxi Cai, Lu Tian, L. J. Wei Oct 2003

Semi-Parametric Box-Cox Power Transformation Models For Censored Survival Observations, Tianxi Cai, Lu Tian, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Unification Of Variance Components And Haseman-Elston Regression For Quantitative Trait Linkage Analysis, Wei-Min Chen, Karl W. Broman, Kung-Yee Liang Oct 2003

Unification Of Variance Components And Haseman-Elston Regression For Quantitative Trait Linkage Analysis, Wei-Min Chen, Karl W. Broman, Kung-Yee Liang

Johns Hopkins University, Dept. of Biostatistics Working Papers

Two of the major approaches for linkage analysis with quantitative traits in humans include variance components and Haseman-Elston regression. Previously, these have been viewed as quite separate methods. We describe a general model, fit by use of generalized estimating equations (GEE), for which the variance components and Haseman-Elston methods (including many of the extensions to the original Haseman-Elston method) are special cases, corresponding to different choices for a working covariance matrix. We also show that the regression-based test of Sham et al.(2002) is equivalent to a robust score statistic derived from our GEE approach. These results have several important implications. …


Smooth Quantile Ratio Estimation, Francesca Dominici, Leslie Cope, Daniel Q. Naiman, Scott L. Zeger Oct 2003

Smooth Quantile Ratio Estimation, Francesca Dominici, Leslie Cope, Daniel Q. Naiman, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

In a study of health care expenditures attributable to smoking, we seek to compare the distribution of medical costs for persons with lung cancer or chronic obstructive pulmonary disease (cases) to those without (controls) using a national survey which includes hundreds of cases and thousands of controls. The distribution of costs is highly skewed toward larger values, making estimates of the mean from the smaller sample dependent on a small fraction of the biggest values. One approach to deal with the smaller sample is to rely on a simple parametric model such as the log-normal, but this makes the undesirable …


Hierarchical Bivariate Time Series Models: A Combined Analysis Of The Effects Of Particulate Matter On Morbidity And Mortality, Francesca Dominici, Antonella Zanobetti, Scott L. Zeger, Joel Schwartz, Jonathan M. Samet Oct 2003

Hierarchical Bivariate Time Series Models: A Combined Analysis Of The Effects Of Particulate Matter On Morbidity And Mortality, Francesca Dominici, Antonella Zanobetti, Scott L. Zeger, Joel Schwartz, Jonathan M. Samet

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this paper we develop a hierarchical bivariate time series model to characterize the relationship between particulate matter less than 10 microns in aerodynamic diameter (PM10) and both mortality and hospital admissions for cardiovascular diseases. The model is applied to time series data on mortality and morbidity for 10 metropolitan areas in the United States from 1986 to 1993. We postulate that these time series should be related through a shared relationship with PM10.

At the first stage of the hierarchy, we fit two seemingly unrelated Poisson regression models to produce city-specific estimates of the log relative rates of mortality …


Statistical Inferences Based On Non-Smooth Estimating Functions, Lu Tian, Jun S. Liu, Mary Zhao, L. J. Wei Oct 2003

Statistical Inferences Based On Non-Smooth Estimating Functions, Lu Tian, Jun S. Liu, Mary Zhao, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


On The Cox Model With Time-Varying Regression Coefficients, Lu Tian, David Zucker, L. J. Wei Oct 2003

On The Cox Model With Time-Varying Regression Coefficients, Lu Tian, David Zucker, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Nonparametric Estimation Of The Bivariate Recurrence Time Distribution, Chiung-Yu Huang, Mei-Cheng Wang Oct 2003

Nonparametric Estimation Of The Bivariate Recurrence Time Distribution, Chiung-Yu Huang, Mei-Cheng Wang

Johns Hopkins University, Dept. of Biostatistics Working Papers

This paper considers statistical models in which two different types of events, such as the diagnosis of a disease and the remission of the disease, occur alternately over time and are observed subject to right censoring. We propose nonparametric estimators for the joint distribution of bivariate recurrence times and the marginal distribution of the first recurrence time. In general, the marginal distribution of the second recurrence time cannot be estimated due to an identifiability problem, but a conditional distribution of the second recurrence time can be estimated non-parametrically. In literature, statistical methods have been developed to estimate the joint distribution …


A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani Oct 2003

A Nested Unsupervised Approach To Identifying Novel Molecular Subtypes, Elizabeth Garrett, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

In classification problems arising in genomics research it is common to study populations for which a broad class assignment is known (say, normal versus diseased) and one seeks to find undiscovered subclasses within one or both of the known classes. Formally, this problem can be thought of as an unsupervised analysis nested within a supervised one. Here we take the view that the nested unsupervised analysis can successfully utilize information from the entire data set for constructing and/or selecting useful predictors. Specifically, we propose a mixture model approach to the nested unsupervised problem, where the supervised information is used to …


A Population Pharmacokinetic Model With Time-Dependent Covariates Measured With Errors, Lang Lil, Xihong Lin, Mort B. Brown, Suneel Gupta, Kyung-Hoon Lee Oct 2003

A Population Pharmacokinetic Model With Time-Dependent Covariates Measured With Errors, Lang Lil, Xihong Lin, Mort B. Brown, Suneel Gupta, Kyung-Hoon Lee

The University of Michigan Department of Biostatistics Working Paper Series

We propose a population pharmacokinetic (PK) model with time-dependent covariates measured with errors. This model is used to model S-oxybutynin's kinetics following an oral administration of Ditropan, and allows the distribution rate to depend on time-dependent covariates blood pressure and heart rate, which are measured with errors. We propose two two-step estimation methods: the second order two-step method with numerical solutions of differential equations (2orderND), and the second order two-step method with closed form approximate solutions of differential equations (2orderAD). The proposed methods are computationally easy and require fitting a linear mixed model at the first step and a nonlinear …