Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

SelectedWorks

General Biostatistics

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

Estimating Controlled Direct Effects Of Restrictive Feeding Practices In The `Early Dieting In Girls' Study, Yeying Zhu, Debashis Ghosh, Donna L. Coffman, Jennifer S. Williams Jan 2015

Estimating Controlled Direct Effects Of Restrictive Feeding Practices In The `Early Dieting In Girls' Study, Yeying Zhu, Debashis Ghosh, Donna L. Coffman, Jennifer S. Williams

Debashis Ghosh

In this article, we examine the causal effect of parental restrictive feeding practices on children’s weight status. An important mediator we are interested in is children’s self-regulation status. Traditional mediation analysis (Baron and Kenny, 1986) applies a structural equation modelling (SEM) approach and decomposes the intent-to-treat (ITT) effect into direct and indirect effects. More recent approaches interpret the mediation effects based on the potential outcomes framework. In practice, there often exist confounders that jointly influence the mediator and the outcome. Inverse probability weighting based on propensity scores are used to adjust for confounding and reduce the dimensionality of confounders simultaneously. …


The Number Of Subjects Per Variable Required In Linear Regression Analyses, Peter Austin, Ewout Steyerberg Jan 2015

The Number Of Subjects Per Variable Required In Linear Regression Analyses, Peter Austin, Ewout Steyerberg

Peter Austin

Objectives: To determine the number of independent variables that can be included in a linear regression model.

Study Design and Setting: We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R2 of the fitted model.

Results: A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, …


Penalized Regression Procedures For Variable Selection In The Potential Outcomes Framework, Debashis Ghosh, Yeying Zhu, Donna L. Coffman Jan 2013

Penalized Regression Procedures For Variable Selection In The Potential Outcomes Framework, Debashis Ghosh, Yeying Zhu, Donna L. Coffman

Debashis Ghosh

A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple `impute, then select' class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation …


A Data-Adaptive Strategy For Inverse Weighted Estimation Of Causal Effects, Yeying Zhu, Debashis Ghosh, Bhramar Mukherjee, Nandita Mitra Jan 2013

A Data-Adaptive Strategy For Inverse Weighted Estimation Of Causal Effects, Yeying Zhu, Debashis Ghosh, Bhramar Mukherjee, Nandita Mitra

Debashis Ghosh

In most nonrandomized observational studies, differences between treatment groups may arise not only due to the treatment but also because of the effect of confounders. Therefore, causal inference regarding the treatment effect is not as straightforward as in a randomized trial. To adjust for confounding due to measured covariates, the average treatment effect is often estimated by using propensity scores. In this article, we focus on the use of inverse probability weighted (IPW) estimation methods. Typically, propensity scores are estimated by logistic regression. More recent suggestions have been to employ nonparametric classification algorithms from machine learning. In this article, we …


Predictive Accuracy Of Risk Factors And Markers: A Simulation Study Of The Effect Of Novel Markers On Different Performance Measures For Logistic Regression Models, Peter C. Austin Jan 2013

Predictive Accuracy Of Risk Factors And Markers: A Simulation Study Of The Effect Of Novel Markers On Different Performance Measures For Logistic Regression Models, Peter C. Austin

Peter Austin

The change in c-statistic is frequently used to summarize the change in predictive accuracy when a novel risk factor is added to an existing logistic regression model. We explored the relationship between the absolute change in the c-statistic, Brier score, generalized R(2) , and the discrimination slope when a risk factor was added to an existing model in an extensive set of Monte Carlo simulations. The increase in model accuracy due to the inclusion of a novel marker was proportional to both the prevalence of the marker and to the odds ratio relating the marker to the outcome but inversely …


James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh Jan 2012

James-Stein Estimation And The Benjamini-Hochberg Procedure, Debashis Ghosh

Debashis Ghosh

For the problem of multiple testing, the Benjamini-Hochberg (B-H) procedure has become a very popular method in applications. Based on a spacings theory representation of the B-H procedure, we are able to motivate the use of shrinkage estimators for modifying the B-H procedure. Several generalizations in the paper are discussed, and the methodology is applied to real and simulated datasets.


Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin Jan 2012

Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin

Peter Austin

Purpose: Observational studies using electronic administrative health care databases are often used to estimate the effects of treatments and exposures. Traditionally, a cohort design has been used to estimate these effects, but increasingly studies are using a nested case-control (NCC) design. The relative statistical efficiency of these two designs has not been examined in detail.

Methods: We used Monte Carlo simulations to compare these two designs in terms of the bias and precision of effect estimates. We examined three different settings: (A): treatment occurred at baseline and there was a single outcome of interest; (B): treatment was time-varying and there …


Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin Jan 2012

Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin

Peter Austin

Researchers are increasingly using observational or nonrandomized data to estimate causal treatment effects. Essential to the production of high-quality evidence is the ability to reduce or minimize the confounding that frequently occurs in observational studies. When using the potential outcome framework to define causal treatment effects, one requires the potential outcome under each possible treatment. However, only the outcome under the actual treatment received is observed, whereas the potential outcomes under the other treatments are considered missing data. Some authors have proposed that parametric regression models be used to estimate potential outcomes. In this study, we examined the use of …


Generating Survival Times To Simulate Cox Proportional Hazards Models With Time-Varying Covariates., Peter C. Austin Jan 2012

Generating Survival Times To Simulate Cox Proportional Hazards Models With Time-Varying Covariates., Peter C. Austin

Peter Austin

Simulations and Monte Carlo methods serve an important role in modern statistical research. They allow for an examination of the performance of statistical procedures in settings in which analytic and mathematical derivations may not be feasible. A key element in any statistical simulation is the existence of an appropriate data-generating process: one must be able to simulate data from a specified statistical model. We describe data-generating processes for the Cox proportional hazards model with time-varying covariates when event times follow an exponential, Weibull, or Gompertz distribution. We consider three types of time-varying covariates: first, a dichotomous time-varying covariate that can …


Propensity Score Modelling In Observational Studies Using Dimension Reduction Methods, Debashis Ghosh Jan 2011

Propensity Score Modelling In Observational Studies Using Dimension Reduction Methods, Debashis Ghosh

Debashis Ghosh

Conditional independence assumptions are very important in causal inference modelling as well as in dimension reduction methodologies. These are two very strikingly different statistical literatures, and we study links between the two in this article. The concept of covariate sufficiency plays an important role, and we provide theoretical justication when dimension reduction and partial least squares methods will allow for valid causal inference to be performed. The methods are illustrated with application to a medical study and to simulated data.


Links Between Analysis Of Surrogate Endpoints And Endogeneity, Debashis Ghosh, Jeremy M. Taylor, Michael R. Elliott Jan 2010

Links Between Analysis Of Surrogate Endpoints And Endogeneity, Debashis Ghosh, Jeremy M. Taylor, Michael R. Elliott

Debashis Ghosh

There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace \true" endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. A numerical example is used to illustrate …


Meta-Analysis For Surrogacy: Accelerated Failure Time Models And Semicompeting Risks Modelling, Debashis Ghosh, Jeremy M. Taylor, Daniel J. Sargent Jan 2010

Meta-Analysis For Surrogacy: Accelerated Failure Time Models And Semicompeting Risks Modelling, Debashis Ghosh, Jeremy M. Taylor, Daniel J. Sargent

Debashis Ghosh

There has been great recent interest in the medical and statistical literature in the assessment and validation of surrogate endpoints as proxies for clinical endpoints in medical studies. More recently, authors have focused on using meta-analytical methods for quanti cation of surrogacy. In this article, we extend existing procedures for analysis based on the accelerated failure time model to this setting. An advantage of this approach relative to proportional hazards model is that it allows for analysis in the semi-competing risks setting, where we constrain the surrogate endpoint to occur before the true endpoint. A novel principal components procedure is …


Spline-Based Models For Predictiveness Curves, Debashis Ghosh, Michael Sabel Jan 2010

Spline-Based Models For Predictiveness Curves, Debashis Ghosh, Michael Sabel

Debashis Ghosh

A biomarker is dened to be a biological characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. The use of biomarkers in cancer has been advocated for a variety of purposes, which include use as surrogate endpoints, early detection of disease, proxies for environmental exposure and risk prediction. We deal with the latter issue in this paper. Several authors have proposed use of the predictiveness curve for assessing the capacity of a biomarker for risk prediction. For most situations, it is reasonable to assume monotonicity of …


Combining Multiple Models With Survival Data: The Phase Algorithm, Debashis Ghosh, Zheng Yuan Jan 2010

Combining Multiple Models With Survival Data: The Phase Algorithm, Debashis Ghosh, Zheng Yuan

Debashis Ghosh

In many scientic studies, one common goal is to develop good prediction rules based on a set of available measurements. This paper proposes a model averaging methodology using proportional hazards regression models to construct new estimators of predicted survival probabilities. A screening step based on an adaptive searching algorithm is used to handle large numbers of covariates. The nite-sample properties of the proposed methodology is assessed using simulation studies. Application of the method to a cancer biomarker study is also given.


Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen Jan 2008

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen

Maya Petersen

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is not mediated by an intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Robins, Greenland and Pearl develop counterfactual definitions for two types of direct effects, natural and controlled, and discuss assumptions, beyond those of sequential randomization, required for the identifiability of natural direct effects. Building on their earlier work and that of others, this article …


Multiple Testing Procedures Under Confounding, Debashis Ghosh Jan 2008

Multiple Testing Procedures Under Confounding, Debashis Ghosh

Debashis Ghosh

While multiple testing procedures have been the focus of much statistical research, an important facet of the problem is how to deal with possible confounding. Procedures have been developed by authors in genetics and statistics. In this chapter, we relate these proposals. We propose two new multiple testing approaches within this framework. The first combines sensitivity analysis methods with false discovery rate estimation procedures. The second involves construction of shrinkage estimators that utilize the mixture model for multiple testing. The procedures are illustrated with applications to a gene expression profiling experiment in prostate cancer.


Joint Variable Selection And Classification With Immunohistochemical Data, Debashis Ghosh, Ratna Chakrabarti Jan 2008

Joint Variable Selection And Classification With Immunohistochemical Data, Debashis Ghosh, Ratna Chakrabarti

Debashis Ghosh

To determine if candidate cancer biomarkers have utility in a clinical setting, validation using immunohistochemical methods is typically done. Most analyses of such data have not incorporated the multivariate nature of the staining profiles. In this article, we consider modelling such data using recently developed ideas from the machine learning community. In particular, we consider the joint goals of feature selection and classification. We develop esti- mation procedures for the analysis of immunohistochemical profiles using the least absolute selection and shrinkage operator. These lead to novel and flexible models and algorithms for the analysis of compositional data. The techniques are …


An Improved Model Averaging Scheme For Logistic Regression, Debashis Ghosh, Zheng Yuan Jan 2008

An Improved Model Averaging Scheme For Logistic Regression, Debashis Ghosh, Zheng Yuan

Debashis Ghosh

Recently, penalized regression methods have attracted much attention in the statistical literature. In this article, we argue that such methods can be improved for the purposes of prediction by utilizing model averaging ideas. We propose a new algorithm that combines penalized regression with model averaging for improved prediction. We also discuss the issue of model selection versus model averaging and propose a diagnostic based on the notion of generalized degrees of freedom. The proposed methods are studied using both simulated and real data.


Statistical Learning Of Origin-Specific Statically Optimal Individualized Treatment Rules, Mark J. Van Der Laan, Maya L. Petersen Apr 2007

Statistical Learning Of Origin-Specific Statically Optimal Individualized Treatment Rules, Mark J. Van Der Laan, Maya L. Petersen

Maya Petersen

Consider a longitudinal observational or controlled study in which one collects chronological data over time on a random sample of subjects. The time-dependent process one observes on each subject contains time-dependent covariates, time-dependent treatment actions, and an outcome process or single final outcome of interest. A statically optimal individualized treatment rule (as introduced in van der Laan et. al. (2005), Petersen et. al. (2007)) is a treatment rule which at any point in time conditions on a user-supplied subset of the past, computes the future static treatment regimen that maximizes a (conditional) mean future outcome of interest, and applies the …


Causal Effect Models For Realistic Individualized Treatment And Intention To Treat Rules, Mark J. Van Der Laan, Maya L. Petersen Mar 2007

Causal Effect Models For Realistic Individualized Treatment And Intention To Treat Rules, Mark J. Van Der Laan, Maya L. Petersen

Maya Petersen

Marginal structural models (MSM) are an important class of models in causal inference. Given a longitudinal data structure observed on a sample of n independent and identically distributed experimental units, MSM model the counterfactual outcome distribution corresponding with a static treatment intervention, conditional on user-supplied baseline covariates. Identification of a static treatment regimen-specific outcome distribution based on observational data requires, beyond the standard sequential randomization assumption, the assumption that each experimental unit has positive probability of following the static treatment regimen. The latter assumption is called the experimental treatment assignment (ETA) assumption, and is parameter-specific. In many studies the ETA …


Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan Dec 2006

Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan

Maya Petersen

This chapter describes a systematic and targeted approach for estimating the impact of each of a large number of baseline covariates on an outcome that is measured repeatedly over time. These variable importance estimates can be adjusted for a user-specified set of confounders and lend themselves in a straightforward way to obtaining confidence intervals and p-values. Hence, they can in particular be used to identify a subset of baseline covariates that are the most important explanatory variables for the time-varying outcome of interest. We illustrate the methodology in a data analysis aimed at finding mutations of the human immunodeficiency virus …