Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Causal inference (4)
- Confounding (2)
- Counterfactual (2)
- Double robust estimation (2)
- G-computation estimation (2)
-
- Point-of-service health plan (2)
- Referral to specialists (2)
- Stroke (2)
- ADL (1)
- Adjacency matrix; disease mapping; epidemiology; Markov processes (1)
- Aging (1)
- Antiretroviral resistance (1)
- Antiretroviral therapy (1)
- BCa bootstrap (1)
- Backfitting algorithm; CAR model; collapsibility; epidemiology; Gauss-Seidel algorithm; iterative weighted least squares algorithm (1)
- Bayesian statistics; Fourier basis; FFT; generalized linear mixed model; geostatistics; spatial statistics (1)
- Bias (1)
- Bioinformatics (1)
- Biomedical signal processing (1)
- Case-cohort design; Censored linear regression; Gehan-type weights; Linear Programming; Monotone estimating function; Newton-type method. (1)
- Causal inference; EM algorithm; General location model; Missing data; Non-compliance (1)
- Cerebrovascular disease (1)
- Constrained MCMC (1)
- Cross-validation (1)
- Data augmentation (1)
- Diagnostic tests (1)
- Dying (1)
- Dynamic treatment regime (1)
- EM algorithm (1)
- EM-algorithm (1)
- Publication
-
- Johns Hopkins University, Dept. of Biostatistics Working Papers (4)
- U.C. Berkeley Division of Biostatistics Working Paper Series (4)
- UW Biostatistics Working Paper Series (4)
- Harvard University Biostatistics Working Paper Series (3)
- The University of Michigan Department of Biostatistics Working Paper Series (2)
Articles 1 - 18 of 18
Full-Text Articles in Applied Mathematics
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang
COBRA Preprint Series
Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …
Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng
Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng
Johns Hopkins University, Dept. of Biostatistics Working Papers
No abstract provided.
Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard
Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard
U.C. Berkeley Division of Biostatistics Working Paper Series
We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including …
Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan
Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan
Harvard University Biostatistics Working Paper Series
No abstract provided.
Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan
Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan
Harvard University Biostatistics Working Paper Series
Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases.
This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM …
Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan
Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan
Harvard University Biostatistics Working Paper Series
In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.
A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial …
A Hybrid Newton-Type Method For The Linear Regression In Case-Cohort Studies, Menggang Yu, Bin Nan
A Hybrid Newton-Type Method For The Linear Regression In Case-Cohort Studies, Menggang Yu, Bin Nan
The University of Michigan Department of Biostatistics Working Paper Series
Case-cohort designs are increasingly commonly used in large epidemiological cohort studies. Nan, Yu, and Kalbeisch (2004) provided the asymptotic results for censored linear regression models in case-cohort studies. In this article, we consider computational aspects of their proposed rank based estimating methods. We show that the rank based discontinuous estimating functions for case-cohort studies are monotone, a property established for cohort data in the literature, when generalized Gehan type of weights are used. Though the estimating problem can be formulated to a linear programming problem as that for cohort data, due to its easily uncontrollable large scale even for a …
Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan
Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
An important problem in epidemiology and medical research is the estimation of the causal effect of a treatment action at a single point in time on the mean of an outcome, possibly within strata of the target population defined by a subset of the baseline covariates. Current approaches to this problem are based on marginal structural models, i.e., parametric models for the marginal distribution of counterfactural outcomes as a function of treatment and effect modifiers. The various estimators developed in this context furthermore each depend on a high-dimensional nuisance parameter whose estimation currently also relies on parametric models. Since misspecification …
History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen
History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen
U.C. Berkeley Division of Biostatistics Working Paper Series
Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at a final time point. However, the utility of these models for some applications has been limited by their inability to incorporate modification of the causal effect of treatment by time-varying covariates. …
Studying Effects Of Primary Care Physicians And Patients On The Trade-Off Between Charges For Primary Care And Specialty Care Using A Hierarchical Multivariate Two-Part Model, John W. Robinson, Scott L. Zeger, Christopher B. Forrest
Studying Effects Of Primary Care Physicians And Patients On The Trade-Off Between Charges For Primary Care And Specialty Care Using A Hierarchical Multivariate Two-Part Model, John W. Robinson, Scott L. Zeger, Christopher B. Forrest
Johns Hopkins University, Dept. of Biostatistics Working Papers
Objective. To examine effects of primary care physicians (PCPs) and patients on the association between charges for primary care and specialty care in a point-of-service (POS) health plan.
Data Source. Claims from 1996 for 3,308 adult male POS plan members, each of whom was assigned to one of the 50 family practitioner-PCPs with the largest POS plan member-loads.
Study Design. A hierarchical multivariate two-part model was fitted using a Gibbs sampler to estimate PCPs' effects on patients' annual charges for two types of services, primary care and specialty care, the associations among PCPs' effects, and within-patient associations between charges for …
A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest
A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest
Johns Hopkins University, Dept. of Biostatistics Working Papers
Procedures for analyzing and comparing healthcare providers' effects on health services delivery and outcomes have been referred to as provider profiling. In a typical profiling procedure, patient-level responses are measured for clusters of patients treated by providers that in turn, can be regarded as statistically exchangeable. Thus, a hierarchical model naturally represents the structure of the data. When provider effects on multiple responses are profiled, a multivariate model rather than a series of univariate models, can capture associations among responses at both the provider and patient levels. When responses are in the form of charges for healthcare services and sampled …
Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou
Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou
UW Biostatistics Working Paper Series
In evaluation of diagnostic accuracy of tests, a gold standard on the disease status is required. However, in many complex diseases, it is impossible or unethical to obtain such the gold standard. If an imperfect standard is used as if it were a gold standard, the estimated accuracy of the tests would be biased. This type of bias is called imperfect gold standard bias. In this paper we develop a maximum likelihood (ML) method for estimating ROC curves and their areas of ordinal-scale tests in the absence of a gold standard. Our simulation study shows the proposed estimates for the …
Incorporating Death Into Health-Related Variables In Longitudinal Studies, Paula Diehr, Laura Lee Johnson, Donald L. Patrick, Bruce Psaty
Incorporating Death Into Health-Related Variables In Longitudinal Studies, Paula Diehr, Laura Lee Johnson, Donald L. Patrick, Bruce Psaty
UW Biostatistics Working Paper Series
Background: The aging process can be described as the change in health-related variables over time. Unfortunately, simple graphs of available data may be misleading if some people die, since they may confuse patterns of mortality with patterns of change in health. Methods have been proposed to incorporate death into self-rated health (excellent to poor) and the SF-36 profile scores, but not for other variables.
Objectives: (1) To incorporate death into the following variables: ADLs, IADLs, mini-mental state examination, depressive symptoms, body mass index (BMI), blocks walked per week, bed days, hospitalization, systolic blood pressure, and the timed walk. (2) To …
Cross-Calibration Of Stroke Disability Measures: Bayesian Analysis Of Longitudinal Ordinal Categorical Data Using Negative Dependence, Giovanni Parmigiani, Heidi W. Ashih, Gregory P. Samsa, Pamela W. Duncan, Sue Min Lai, David B. Matchar
Cross-Calibration Of Stroke Disability Measures: Bayesian Analysis Of Longitudinal Ordinal Categorical Data Using Negative Dependence, Giovanni Parmigiani, Heidi W. Ashih, Gregory P. Samsa, Pamela W. Duncan, Sue Min Lai, David B. Matchar
Johns Hopkins University, Dept. of Biostatistics Working Papers
It is common to assess disability of stroke patients using standardized scales, such as the Rankin Stroke Outcome Scale (RS) and the Barthel Index (BI). The Rankin Scale, which was designed for applications to stroke, is based on assessing directly the global conditions of a patient. The Barthel Index, which was designed for general applications, is based on a series of questions about the patient’s ability to carry out 10 basis activities of daily living. As both scales are commonly used, but few studies use both, translating between scales is important in gaining an overall understanding of the efficacy of …
An Extended General Location Model For Causal Inference From Data Subject To Noncompliance And Missing Values, Yahong Peng, Rod Little, Trivellore E. Raghuanthan
An Extended General Location Model For Causal Inference From Data Subject To Noncompliance And Missing Values, Yahong Peng, Rod Little, Trivellore E. Raghuanthan
The University of Michigan Department of Biostatistics Working Paper Series
Noncompliance is a common problem in experiments involving randomized assignment of treatments, and standard analyses based on intention-to treat or treatment received have limitations. An attractive alternative is to estimate the Complier-Average Causal Effect (CACE), which is the average treatment effect for the subpopulation of subjects who would comply under either treatment (Angrist, Imbens and Rubin, 1996, henceforth AIR). We propose an Extended General Location Model to estimate the CACE from data with non-compliance and missing data in the outcome and in baseline covariates. Models for both continuous and categorical outcomes and ignorable and latent ignorable (Frangakis and Rubin, 1999) …
A Bootstrap Confidence Interval Procedure For The Treatment Effect Using Propensity Score Subclassification, Wanzhu Tu, Xiao-Hua Zhou
A Bootstrap Confidence Interval Procedure For The Treatment Effect Using Propensity Score Subclassification, Wanzhu Tu, Xiao-Hua Zhou
UW Biostatistics Working Paper Series
In the analysis of observational studies, propensity score subclassification has been shown to be a powerful method for adjusting unbalanced covariates for the purpose of causal inferences. One practical difficulty in carrying out such an analysis is to obtain a correct variance estimate for such inferences, while reducing bias in the estimate of the treatment effect due to an imbalance in the measured covariates. In this paper, we propose a bootstrap procedure for the inferences concerning the average treatment effect; our bootstrap method is based on an extension of Efron’s bias-corrected accelerated (BCa) bootstrap confidence interval to a two-sample problem. …
Estimating The Accuracy Of Polymerase Chain Reaction-Based Tests Using Endpoint Dilution, Jim Hughes, Patricia Totten
Estimating The Accuracy Of Polymerase Chain Reaction-Based Tests Using Endpoint Dilution, Jim Hughes, Patricia Totten
UW Biostatistics Working Paper Series
PCR-based tests for various microorganisms or target DNA sequences are generally acknowledged to be highly "sensitive" yet the concept of sensitivity is ill-defined in the literature on these tests. We propose that sensitivity should be expressed as a function of the number of target DNA molecules in the sample (or specificity when the target number is 0). However, estimating this "sensitivity curve" is problematic since it is difficult to construct samples with a fixed number of targets. Nonetheless, using serially diluted replicate aliquots of a known concentration of the target DNA sequence, we show that it is possible to disentangle …
An Empirical Study Of Marginal Structural Models For Time-Independent Treatment, Tanya A. Henneman, Mark J. Van Der Laan
An Empirical Study Of Marginal Structural Models For Time-Independent Treatment, Tanya A. Henneman, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
In non-randomized treatment studies a significant problem for statisticians is determining how best to adjust for confounders. Marginal structural models (MSMs) and inverse probability of treatment weighted (IPTW) estimators are useful in analyzing the causal effect of treatment in observational studies. Given an IPTW estimator a doubly robust augmented IPTW (AIPTW) estimator orthogonalizes it resulting in a more e±cient estimator than the IPTW estimator. One purpose of this paper is to make a practical comparison between the IPTW estimator and the doubly robust AIPTW estimator via a series of Monte- Carlo simulations. We also consider the selection of the optimal …