Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Causal inference (6)
- Counterfactual (4)
- Sensitivity (4)
- Air pollution (3)
- Classification (3)
-
- Bayesian hierarchical model (2)
- Confounding (2)
- Corrected score (2)
- Dependence (2)
- Diagnostic test (2)
- Disease screening (2)
- Double robust estimation (2)
- Epidemiology (2)
- G-computation estimation (2)
- Genetics (2)
- Health expenditures (2)
- Interval censoring (2)
- Likelihood (2)
- Linear regression (2)
- Log-normal (2)
- Logistic (2)
- Marginal model (2)
- Marginal structural model (2)
- Marginal structural models (2)
- Measurement error (2)
- Multivariate distribution (2)
- Observational studies (2)
- Prediction (2)
- Q-Q plots (2)
- ROC curve (2)
- Publication Year
- Publication
-
- UW Biostatistics Working Paper Series (20)
- Harvard University Biostatistics Working Paper Series (15)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (15)
- The University of Michigan Department of Biostatistics Working Paper Series (12)
- U.C. Berkeley Division of Biostatistics Working Paper Series (11)
Articles 1 - 30 of 77
Full-Text Articles in Statistical Models
Inferring A Consensus Problem List Using Penalized Multistage Models For Ordered Data, Philip S. Boonstra, John C. Krauss
Inferring A Consensus Problem List Using Penalized Multistage Models For Ordered Data, Philip S. Boonstra, John C. Krauss
The University of Michigan Department of Biostatistics Working Paper Series
A patient's medical problem list describes his or her current health status and aids in the coordination and transfer of care between providers, among other things. Because a problem list is generated once and then subsequently modified or updated, what is not usually observable is the provider-effect. That is, to what extent does a patient's problem in the electronic medical record actually reflect a consensus communication of that patient's current health status? To that end, we report on and analyze a unique interview-based design in which multiple medical providers independently generate problem lists for each of three patient case abstracts …
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
COBRA Preprint Series
Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …
Net Reclassification Index: A Misleading Measure Of Prediction Improvement, Margaret Sullivan Pepe, Holly Janes, Kathleen F. Kerr, Bruce M. Psaty
Net Reclassification Index: A Misleading Measure Of Prediction Improvement, Margaret Sullivan Pepe, Holly Janes, Kathleen F. Kerr, Bruce M. Psaty
UW Biostatistics Working Paper Series
The evaluation of biomarkers to improve risk prediction is a common theme in modern research. Since its introduction in 2008, the net reclassification index (NRI) (Pencina et al. 2008, Pencina et al. 2011) has gained widespread use as a measure of prediction performance with over 1,200 citations as of June 30, 2013. The NRI is considered by some to be more sensitive to clinically important changes in risk than the traditional change in the AUC (Delta AUC) statistic (Hlatky et al. 2009). Recent statistical research has raised questions, however, about the validity of conclusions based on the NRI. (Hilden and …
Attributing Effects To Interactions, Tyler J. Vanderweele, Eric J. Tchetgen Tchetgen
Attributing Effects To Interactions, Tyler J. Vanderweele, Eric J. Tchetgen Tchetgen
Harvard University Biostatistics Working Paper Series
A framework is presented which allows an investigator to estimate the portion of the effect of one exposure that is attributable to an interaction with a second exposure. We show that when the two exposures are independent, the total effect of one exposure can be decomposed into a conditional effect of that exposure and a component due to interaction. The decomposition applies on difference or ratio scales. We discuss how the components can be estimated using standard regression models, and how these components can be used to evaluate the proportion of the total effect of the primary exposure attributable to …
Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan
Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides …
A Regionalized National Universal Kriging Model Using Partial Least Squares Regression For Estimating Annual Pm2.5 Concentrations In Epidemiology, Paul D. Sampson, Mark Richards, Adam A. Szpiro, Silas Bergen, Lianne Sheppard, Timothy V. Larson, Joel Kaufman
A Regionalized National Universal Kriging Model Using Partial Least Squares Regression For Estimating Annual Pm2.5 Concentrations In Epidemiology, Paul D. Sampson, Mark Richards, Adam A. Szpiro, Silas Bergen, Lianne Sheppard, Timothy V. Larson, Joel Kaufman
UW Biostatistics Working Paper Series
Many cohort studies in environmental epidemiology require accurate modeling and prediction of fine scale spatial variation in ambient air quality across the U.S. This modeling requires the use of small spatial scale geographic or “land use” regression covariates and some degree of spatial smoothing. Furthermore, the details of the prediction of air quality by land use regression and the spatial variation in ambient air quality not explained by this regression should be allowed to vary across the continent due to the large scale heterogeneity in topography, climate, and sources of air pollution. This paper introduces a regionalized national universal kriging …
Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng
Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng
Johns Hopkins University, Dept. of Biostatistics Working Papers
No abstract provided.
Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang
Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang
Johns Hopkins University, Dept. of Biostatistics Working Papers
In disease surveillance systems or registries, bivariate survival data are typically collected under interval sampling. It refers to a situation when entry into a registry is at the time of the first failure event (e.g., HIV infection) within a calendar time interval, the time of the initiating event (e.g., birth) is retrospectively identified for all the cases in the registry, and subsequently the second failure event (e.g., death) is observed during the follow-up. Sampling bias is induced due to the selection process that the data are collected conditioning on the first failure event occurs within a time interval. Consequently, the …
Reduced Bayesian Hierarchical Models: Estimating Health Effects Of Simultaneous Exposure To Multiple Pollutants, Jennifer F. Bobb, Francesca Dominici, Roger D. Peng
Reduced Bayesian Hierarchical Models: Estimating Health Effects Of Simultaneous Exposure To Multiple Pollutants, Jennifer F. Bobb, Francesca Dominici, Roger D. Peng
Johns Hopkins University, Dept. of Biostatistics Working Papers
Quantifying the health effects associated with simultaneous exposure to many air pollutants is now a research priority of the US EPA. Bayesian hierarchical models (BHM) have been extensively used in multisite time series studies of air pollution and health to estimate health effects of a single pollutant adjusted for potential confounding of other pollutants and other time-varying factors. However, when the scientific goal is to estimate the impacts of many pollutants jointly, a straightforward application of BHM is challenged by the need to specify a random-effect distribution on a high-dimensional vector of nuisance parameters, which often do not have an …
Threshold Regression Models Adapted To Case-Control Studies, And The Risk Of Lung Cancer Due To Occupational Exposure To Asbestos In France, Antoine Chambaz, Dominique Choudat, Catherine Huber, Jean-Claude Pairon, Mark J. Van Der Laan
Threshold Regression Models Adapted To Case-Control Studies, And The Risk Of Lung Cancer Due To Occupational Exposure To Asbestos In France, Antoine Chambaz, Dominique Choudat, Catherine Huber, Jean-Claude Pairon, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Asbestos has been known for many years as a powerful carcinogen. Our purpose is quantify the relationship between an occupational exposure to asbestos and an increase of the risk of lung cancer. Furthermore, we wish to tackle the very delicate question of the evaluation, in subjects suffering from a lung cancer, of how much the amount of exposure to asbestos explains the occurrence of the cancer. For this purpose, we rely on a recent French case-control study. We build a large collection of threshold regression models, data-adaptively select a better model in it by multi-fold likelihood-based cross-validation, then fit the …
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
COBRA Preprint Series
The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.
We …
Nonparametric Regression With Missing Outcomes Using Weighted Kernel Estimating Equations, Lu Wang, Andrea Rotnitzky, Xihong Lin
Nonparametric Regression With Missing Outcomes Using Weighted Kernel Estimating Equations, Lu Wang, Andrea Rotnitzky, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.
Causal Inference In Epidemiological Studies With Strong Confounding, Kelly L. Moore, Romain S. Neugebauer, Mark J. Van Der Laan, Ira B. Tager
Causal Inference In Epidemiological Studies With Strong Confounding, Kelly L. Moore, Romain S. Neugebauer, Mark J. Van Der Laan, Ira B. Tager
U.C. Berkeley Division of Biostatistics Working Paper Series
One of the identifiabilty assumptions of causal effects defined by marginal structural model (MSM) parameters is the experimental treatment assignment (ETA) assumption. Practical violations of this assumption frequently occur in data analysis, when certain exposures are rarely observed within some strata of the population. The inverse probability of treatment weighted (IPTW) estimator is particularly sensitive to violations of this assumption, however, we demonstrate that this is a problem for all estimators of causal effects. This is due to the fact that the ETA assumption is about information (or lack thereof) in the data. A new class of causal models, causal …
A Spatio-Temporal Approach For Estimating Chronic Effects Of Air Pollution, Sonja Greven, Francesca Dominici, Scott L. Zeger
A Spatio-Temporal Approach For Estimating Chronic Effects Of Air Pollution, Sonja Greven, Francesca Dominici, Scott L. Zeger
Johns Hopkins University, Dept. of Biostatistics Working Papers
Estimating the health risks associated with air pollution exposure is of great importance in public health. In air pollution epidemiology, two study designs have been used mainly. Time series studies estimate acute risk associated with short-term exposure. They compare day-to-day variation of pollution concentrations and mortality rates, and have been criticized for potential confounding by time-varying covariates. Cohort studies estimate chronic effects associated with long-term exposure. They compare long-term average pollution concentrations and time-to-death across cities, and have been criticized for potential confounding by individual risk factors or city-level characteristics.
We propose a new study design and a statistical model, …
Analysis Of Randomized Comparative Clinical Trial Data For Personalized Treatment Selections, Tianxi Cai, Lu Tian, Peggy H. Wong, L. J. Wei
Analysis Of Randomized Comparative Clinical Trial Data For Personalized Treatment Selections, Tianxi Cai, Lu Tian, Peggy H. Wong, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor
Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor
Harvard University Biostatistics Working Paper Series
No abstract provided.
Spatial Misalignment In Time Series Studies Of Air Pollution And Health Data, Roger D. Peng, Michelle L. Bell
Spatial Misalignment In Time Series Studies Of Air Pollution And Health Data, Roger D. Peng, Michelle L. Bell
Johns Hopkins University, Dept. of Biostatistics Working Papers
Time series studies of environmental exposures often involve comparing daily changes in a toxicant measured at a point in space with daily changes in an aggregate measure of health. Spatial misalignment of the exposure and response variables can bias the estimation of health risk and the magnitude of this bias depends on the spatial variation of the exposure of interest. In air pollution epidemiology, there is an increasing focus on estimating the health effects of the chemical components of particulate matter. One issue that is raised by this new focus is the spatial misalignment error introduced by the lack of …
Calibrating Parametric Subject-Specific Risk Estimation, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, L. J. Wei
Calibrating Parametric Subject-Specific Risk Estimation, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
Evaluating Subject-Level Incremental Values Of New Markers For Risk Classification Rule, Tianxi Cai, Lu Tian, Donald M. Lloyd-Jones, L. J. Wei
Evaluating Subject-Level Incremental Values Of New Markers For Risk Classification Rule, Tianxi Cai, Lu Tian, Donald M. Lloyd-Jones, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
Joint Spatial Modeling Of Recurrent Infection And Growth With Processes Under Intermittent Observation, Farouk S. Nathoo
Joint Spatial Modeling Of Recurrent Infection And Growth With Processes Under Intermittent Observation, Farouk S. Nathoo
COBRA Preprint Series
In this article we present new statistical methodology for longitudinal studies in forestry where trees are subject to recurrent infection and the hazard of infection depends on tree growth over time. Understanding the nature of this dependence has important implications for reforestation and breeding programs. Challenges arise for statistical analysis in this setting with sampling schemes leading to panel data, exhibiting dynamic spatial variability, and incomplete covariate histories for hazard regression. In addition, data are collected at a large number of locations which poses computational difficulties for spatiotemporal modeling. A joint model for infection and growth is developed; wherein, a …
Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan
Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan
Harvard University Biostatistics Working Paper Series
No abstract provided.
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh
Harvard University Biostatistics Working Paper Series
No abstract provided.
Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd
Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd
UW Biostatistics Working Paper Series
The panel study design is commonly used to evaluate the short-term health effects of air pollution. Standard statistical methods for analyzing longitudinal data are available, but the literature reveals that the techniques are not well understood by practitioners. We illustrate these methods using data from the 1999 to 2002 Seattle panel study. Marginal, conditional, and transitional approaches for modeling longitudinal data are reviewed and contrasted with respect to their parameter interpretation and methods for accounting for correlation and dealing with missing data. We also discuss and illustrate techniques for controlling for time-dependent and time-independent confounding, and for exploring and summarizing …
Spatial Cluster Detection For Censored Outcome Data, Andrea J. Cook, Diane Gold, Yi Li
Spatial Cluster Detection For Censored Outcome Data, Andrea J. Cook, Diane Gold, Yi Li
Harvard University Biostatistics Working Paper Series
No abstract provided.
Relative Risk Regression In Medical Research: Models, Contrasts, Estimators, And Algorithms, Thomas Lumley, Richard Kronmal, Shuangge Ma
Relative Risk Regression In Medical Research: Models, Contrasts, Estimators, And Algorithms, Thomas Lumley, Richard Kronmal, Shuangge Ma
UW Biostatistics Working Paper Series
The relative risk or prevalence ratio is a natural and familiar summary of association between a binary outcome and an exposure or intervention. For rare events, the relative risk can be approximately estimated by logistic regression. For common events estimation is more difficult. We review proposed estimation algorithms for relative risk regression. Some of these give inconsistent estimates or invalid standard errors. We show that the methods that give correct inference can be viewed as arising from a family of quasilikelihood estimating functions for the same generalized linear model, differing in their efficiency and in their robustness to outlying values …
Causal Comparisons In Randomized Trials Of Two Active Treatments: The Effect Of Supervised Exercise To Promote Smoking Cessation, Jason Roy, Joseph W. Hogan
Causal Comparisons In Randomized Trials Of Two Active Treatments: The Effect Of Supervised Exercise To Promote Smoking Cessation, Jason Roy, Joseph W. Hogan
COBRA Preprint Series
In behavioral medicine trials, such as smoking cessation trials, two or more active treatments are often compared. Noncompliance by some subjects with their assigned treatment poses a challenge to the data analyst. Causal parameters of interest might include those defined by subpopulations based on their potential compliance status under each assignment, using the principal stratification framework (e.g., causal effect of new therapy compared to standard therapy among subjects that would comply with either intervention). Even if subjects in one arm do not have access to the other treatment(s), the causal effect of each treatment typically can only be identified from …
Semiparametric Latent Variable Regression Models For Spatio-Temporal Modeling Of Mobile Source Particles In The Greater Boston Area, Alexandros Gryparis, Brent A. Coull, Joel Schwartz, Helen H. Suh
Semiparametric Latent Variable Regression Models For Spatio-Temporal Modeling Of Mobile Source Particles In The Greater Boston Area, Alexandros Gryparis, Brent A. Coull, Joel Schwartz, Helen H. Suh
Harvard University Biostatistics Working Paper Series
Traffic particle concentrations show considerable spatial variability within a metropolitan area. We consider latent variable semiparametric regression models for modeling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area. Measurements of these pollutants, which are markers of traffic particles, were obtained from several individual exposure studies conducted at specific household locations as well as 15 ambient monitoring sites in the city. The models allow for both flexible, nonlinear effects of covariates and for unexplained spatial and temporal variability in exposure. In addition, the different individual exposure studies recorded different surrogates of traffic …
Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng
Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng
Harvard University Biostatistics Working Paper Series
The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and …
Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan
Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at …