Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Statistical Models

Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin Jan 2013

Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin

Peter Austin

OBJECTIVE: Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine-learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines.

STUDY DESIGN AND SETTING: We compared the performance of these classification methods with that of conventional classification trees to classify patients with heart failure (HF) …


Predictive Accuracy Of Risk Factors And Markers: A Simulation Study Of The Effect Of Novel Markers On Different Performance Measures For Logistic Regression Models, Peter C. Austin Jan 2013

Predictive Accuracy Of Risk Factors And Markers: A Simulation Study Of The Effect Of Novel Markers On Different Performance Measures For Logistic Regression Models, Peter C. Austin

Peter Austin

The change in c-statistic is frequently used to summarize the change in predictive accuracy when a novel risk factor is added to an existing logistic regression model. We explored the relationship between the absolute change in the c-statistic, Brier score, generalized R(2) , and the discrimination slope when a risk factor was added to an existing model in an extensive set of Monte Carlo simulations. The increase in model accuracy due to the inclusion of a novel marker was proportional to both the prevalence of the marker and to the odds ratio relating the marker to the outcome but inversely …


Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin Jan 2012

Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin

Peter Austin

Purpose: Observational studies using electronic administrative health care databases are often used to estimate the effects of treatments and exposures. Traditionally, a cohort design has been used to estimate these effects, but increasingly studies are using a nested case-control (NCC) design. The relative statistical efficiency of these two designs has not been examined in detail.

Methods: We used Monte Carlo simulations to compare these two designs in terms of the bias and precision of effect estimates. We examined three different settings: (A): treatment occurred at baseline and there was a single outcome of interest; (B): treatment was time-varying and there …


Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin Jan 2012

Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin

Peter Austin

Researchers are increasingly using observational or nonrandomized data to estimate causal treatment effects. Essential to the production of high-quality evidence is the ability to reduce or minimize the confounding that frequently occurs in observational studies. When using the potential outcome framework to define causal treatment effects, one requires the potential outcome under each possible treatment. However, only the outcome under the actual treatment received is observed, whereas the potential outcomes under the other treatments are considered missing data. Some authors have proposed that parametric regression models be used to estimate potential outcomes. In this study, we examined the use of …


Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin Jan 2012

Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin

Peter Austin

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1991-2001 and …


Generating Survival Times To Simulate Cox Proportional Hazards Models With Time-Varying Covariates., Peter C. Austin Jan 2012

Generating Survival Times To Simulate Cox Proportional Hazards Models With Time-Varying Covariates., Peter C. Austin

Peter Austin

Simulations and Monte Carlo methods serve an important role in modern statistical research. They allow for an examination of the performance of statistical procedures in settings in which analytic and mathematical derivations may not be feasible. A key element in any statistical simulation is the existence of an appropriate data-generating process: one must be able to simulate data from a specified statistical model. We describe data-generating processes for the Cox proportional hazards model with time-varying covariates when event times follow an exponential, Weibull, or Gompertz distribution. We consider three types of time-varying covariates: first, a dichotomous time-varying covariate that can …


Comparing Paired Vs. Non-Paired Statistical Methods Of Analyses When Making Inferences About Absolute Risk Reductions In Propensity-Score Matched Samples., Peter C. Austin Jan 2011

Comparing Paired Vs. Non-Paired Statistical Methods Of Analyses When Making Inferences About Absolute Risk Reductions In Propensity-Score Matched Samples., Peter C. Austin

Peter Austin

Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk …


Optimal Caliper Widths For Propensity-Score Matching When Estimating Differences In Means And Differences In Proportions In Observational Studies., Peter C. Austin Jan 2011

Optimal Caliper Widths For Propensity-Score Matching When Estimating Differences In Means And Differences In Proportions In Observational Studies., Peter C. Austin

Peter Austin

In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences …


A Tutorial And Case Study In Propensity Score Analysis: An Application To Estimating The Effect Of In-Hospital Smoking Cessation Counseling On Mortality, Peter C. Austin Jan 2011

A Tutorial And Case Study In Propensity Score Analysis: An Application To Estimating The Effect Of In-Hospital Smoking Cessation Counseling On Mortality, Peter C. Austin

Peter Austin

Propensity score methods allow investigators to estimate causal treatment effects using observational or nonrandomized data. In this article we provide a practical illustration of the appropriate steps in conducting propensity score analyses. For illustrative purposes, we use a sample of current smokers who were discharged alive after being hospitalized with a diagnosis of acute myocardial infarction. The exposure of interest was receipt of smoking cessation counseling prior to hospital discharge and the outcome was mortality with 3 years of hospital discharge. We illustrate the following concepts: first, how to specify the propensity score model; second, how to match treated and …


An Introduction To Propensity-Score Methods For Reducing Confounding In Observational Studies, Peter C. Austin Dec 2010

An Introduction To Propensity-Score Methods For Reducing Confounding In Observational Studies, Peter C. Austin

Peter Austin

The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (non-randomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. We describe four different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the …


Statistical Criteria For Selecting The Optimal Number Of Untreated Subjects Matched To Each Treated Subject When Using Many-To-One Matching On The Propensity Score, Peter C. Austin Jan 2010

Statistical Criteria For Selecting The Optimal Number Of Untreated Subjects Matched To Each Treated Subject When Using Many-To-One Matching On The Propensity Score, Peter C. Austin

Peter Austin

Propensity-score matching is increasingly being used to estimate the effects of treatments using observational data. In many-to-one (M:1) matching on the propensity score, M untreated subjects are matched to each treated subject using the propensity score. The authors used Monte Carlo simulations to examine the effect of the choice of M on the statistical performance of matched estimators. They considered matching 1–5 untreated subjects to each treated subject using both nearest-neighbor matching and caliper matching in 96 different scenarios. Increasing the number of untreated subjects matched to each treated subject tended to increase the bias in the estimated treatment effect; …


The Performance Of Different Propensity-Score Methods For Estimating Differences In Proportions (Risk Differences Or Absolute Risk Reductions) In Observational Studies, Peter C. Austin Jan 2010

The Performance Of Different Propensity-Score Methods For Estimating Differences In Proportions (Risk Differences Or Absolute Risk Reductions) In Observational Studies, Peter C. Austin

Peter Austin

Propensity score methods are increasingly being used to estimate the effects of treatments on health outcomes using observational data. There are four methods for using the propensity score to estimate treatment effects: covariate adjustment using the propensity score, stratification on the propensity score, propensity-score matching, and inverse probability of treatment weighting (IPTW) using the propensity score. When outcomes are binary, the effect of treatment on the outcome can be described using odds ratios, relative risks, risk differences, or the number needed to treat. Several clinical commentators suggested that risk differences and numbers needed to treat are more meaningful for clinical …


Balance Diagnostics For Comparing The Distribution Of Baseline Covariates Between Treatment Groups In Propensity-Score Matched Samples, Peter C. Austin Jan 2009

Balance Diagnostics For Comparing The Distribution Of Baseline Covariates Between Treatment Groups In Propensity-Score Matched Samples, Peter C. Austin

Peter Austin

The propensity score is a subject’s probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity-score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity-score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods …


Are (The Log-Odds Of) Hospital Mortality Rates Normally Distributed In Ontario? Implications For Studying Variations In Outcomes Of Medical Care, Peter C. Austin Dec 2008

Are (The Log-Odds Of) Hospital Mortality Rates Normally Distributed In Ontario? Implications For Studying Variations In Outcomes Of Medical Care, Peter C. Austin

Peter Austin

Objective: Hierarchical regression models are used to examine variations in outcomes following the provision of medical care across providers. These models frequently assume a normal distribution for the provider-specific random effects. Poincaré said, “Everyone believes in the normal law, the experimenters because they imagine it a mathematical theorem, and the mathematicians because they think it an experimental fact”. Our objective was to examine the appropriateness of this assumption when examining variations in mortality.

Study design and setting: We used Bayesian model selection methods to compare hierarchical regression models in which the provider-specific random effects were either a normal distribution or …