Open Access. Powered by Scholars. Published by Universities.®

Applied Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Medicine and Health Sciences

COBRA

Keyword
Publication Year
Publication

Articles 1 - 18 of 18

Full-Text Articles in Applied Mathematics

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng Dec 2011

Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard Jul 2011

Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including …


Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan Nov 2006

Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

No abstract provided.


Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan Oct 2005

Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases.

This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM …


Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan Oct 2005

Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan

Harvard University Biostatistics Working Paper Series

In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.

A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial …


A Hybrid Newton-Type Method For The Linear Regression In Case-Cohort Studies, Menggang Yu, Bin Nan Dec 2004

A Hybrid Newton-Type Method For The Linear Regression In Case-Cohort Studies, Menggang Yu, Bin Nan

The University of Michigan Department of Biostatistics Working Paper Series

Case-cohort designs are increasingly commonly used in large epidemiological cohort studies. Nan, Yu, and Kalbeisch (2004) provided the asymptotic results for censored linear regression models in case-cohort studies. In this article, we consider computational aspects of their proposed rank based estimating methods. We show that the rank based discontinuous estimating functions for case-cohort studies are monotone, a property established for cohort data in the literature, when generalized Gehan type of weights are used. Though the estimating problem can be formulated to a linear programming problem as that for cohort data, due to its easily uncontrollable large scale even for a …


Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan Oct 2004

Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

An important problem in epidemiology and medical research is the estimation of the causal effect of a treatment action at a single point in time on the mean of an outcome, possibly within strata of the target population defined by a subset of the baseline covariates. Current approaches to this problem are based on marginal structural models, i.e., parametric models for the marginal distribution of counterfactural outcomes as a function of treatment and effect modifiers. The various estimators developed in this context furthermore each depend on a high-dimensional nuisance parameter whose estimation currently also relies on parametric models. Since misspecification …


History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen Sep 2004

History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at a final time point. However, the utility of these models for some applications has been limited by their inability to incorporate modification of the causal effect of treatment by time-varying covariates. …


Studying Effects Of Primary Care Physicians And Patients On The Trade-Off Between Charges For Primary Care And Specialty Care Using A Hierarchical Multivariate Two-Part Model, John W. Robinson, Scott L. Zeger, Christopher B. Forrest Aug 2004

Studying Effects Of Primary Care Physicians And Patients On The Trade-Off Between Charges For Primary Care And Specialty Care Using A Hierarchical Multivariate Two-Part Model, John W. Robinson, Scott L. Zeger, Christopher B. Forrest

Johns Hopkins University, Dept. of Biostatistics Working Papers

Objective. To examine effects of primary care physicians (PCPs) and patients on the association between charges for primary care and specialty care in a point-of-service (POS) health plan.

Data Source. Claims from 1996 for 3,308 adult male POS plan members, each of whom was assigned to one of the 50 family practitioner-PCPs with the largest POS plan member-loads.

Study Design. A hierarchical multivariate two-part model was fitted using a Gibbs sampler to estimate PCPs' effects on patients' annual charges for two types of services, primary care and specialty care, the associations among PCPs' effects, and within-patient associations between charges for …


A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest Aug 2004

A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest

Johns Hopkins University, Dept. of Biostatistics Working Papers

Procedures for analyzing and comparing healthcare providers' effects on health services delivery and outcomes have been referred to as provider profiling. In a typical profiling procedure, patient-level responses are measured for clusters of patients treated by providers that in turn, can be regarded as statistically exchangeable. Thus, a hierarchical model naturally represents the structure of the data. When provider effects on multiple responses are profiled, a multivariate model rather than a series of univariate models, can capture associations among responses at both the provider and patient levels. When responses are in the form of charges for healthcare services and sampled …


Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou Jul 2004

Non-Parametric Estimation Of Roc Curves In The Absence Of A Gold Standard, Xiao-Hua Zhou, Pete Castelluccio, Chuan Zhou

UW Biostatistics Working Paper Series

In evaluation of diagnostic accuracy of tests, a gold standard on the disease status is required. However, in many complex diseases, it is impossible or unethical to obtain such the gold standard. If an imperfect standard is used as if it were a gold standard, the estimated accuracy of the tests would be biased. This type of bias is called imperfect gold standard bias. In this paper we develop a maximum likelihood (ML) method for estimating ROC curves and their areas of ordinal-scale tests in the absence of a gold standard. Our simulation study shows the proposed estimates for the …


Incorporating Death Into Health-Related Variables In Longitudinal Studies, Paula Diehr, Laura Lee Johnson, Donald L. Patrick, Bruce Psaty Jan 2004

Incorporating Death Into Health-Related Variables In Longitudinal Studies, Paula Diehr, Laura Lee Johnson, Donald L. Patrick, Bruce Psaty

UW Biostatistics Working Paper Series

Background: The aging process can be described as the change in health-related variables over time. Unfortunately, simple graphs of available data may be misleading if some people die, since they may confuse patterns of mortality with patterns of change in health. Methods have been proposed to incorporate death into self-rated health (excellent to poor) and the SF-36 profile scores, but not for other variables.

Objectives: (1) To incorporate death into the following variables: ADLs, IADLs, mini-mental state examination, depressive symptoms, body mass index (BMI), blocks walked per week, bed days, hospitalization, systolic blood pressure, and the timed walk. (2) To …


Cross-Calibration Of Stroke Disability Measures: Bayesian Analysis Of Longitudinal Ordinal Categorical Data Using Negative Dependence, Giovanni Parmigiani, Heidi W. Ashih, Gregory P. Samsa, Pamela W. Duncan, Sue Min Lai, David B. Matchar Aug 2003

Cross-Calibration Of Stroke Disability Measures: Bayesian Analysis Of Longitudinal Ordinal Categorical Data Using Negative Dependence, Giovanni Parmigiani, Heidi W. Ashih, Gregory P. Samsa, Pamela W. Duncan, Sue Min Lai, David B. Matchar

Johns Hopkins University, Dept. of Biostatistics Working Papers

It is common to assess disability of stroke patients using standardized scales, such as the Rankin Stroke Outcome Scale (RS) and the Barthel Index (BI). The Rankin Scale, which was designed for applications to stroke, is based on assessing directly the global conditions of a patient. The Barthel Index, which was designed for general applications, is based on a series of questions about the patient’s ability to carry out 10 basis activities of daily living. As both scales are commonly used, but few studies use both, translating between scales is important in gaining an overall understanding of the efficacy of …


An Extended General Location Model For Causal Inference From Data Subject To Noncompliance And Missing Values, Yahong Peng, Rod Little, Trivellore E. Raghuanthan Aug 2003

An Extended General Location Model For Causal Inference From Data Subject To Noncompliance And Missing Values, Yahong Peng, Rod Little, Trivellore E. Raghuanthan

The University of Michigan Department of Biostatistics Working Paper Series

Noncompliance is a common problem in experiments involving randomized assignment of treatments, and standard analyses based on intention-to treat or treatment received have limitations. An attractive alternative is to estimate the Complier-Average Causal Effect (CACE), which is the average treatment effect for the subpopulation of subjects who would comply under either treatment (Angrist, Imbens and Rubin, 1996, henceforth AIR). We propose an Extended General Location Model to estimate the CACE from data with non-compliance and missing data in the outcome and in baseline covariates. Models for both continuous and categorical outcomes and ignorable and latent ignorable (Frangakis and Rubin, 1999) …


A Bootstrap Confidence Interval Procedure For The Treatment Effect Using Propensity Score Subclassification, Wanzhu Tu, Xiao-Hua Zhou May 2003

A Bootstrap Confidence Interval Procedure For The Treatment Effect Using Propensity Score Subclassification, Wanzhu Tu, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

In the analysis of observational studies, propensity score subclassification has been shown to be a powerful method for adjusting unbalanced covariates for the purpose of causal inferences. One practical difficulty in carrying out such an analysis is to obtain a correct variance estimate for such inferences, while reducing bias in the estimate of the treatment effect due to an imbalance in the measured covariates. In this paper, we propose a bootstrap procedure for the inferences concerning the average treatment effect; our bootstrap method is based on an extension of Efron’s bias-corrected accelerated (BCa) bootstrap confidence interval to a two-sample problem. …


Estimating The Accuracy Of Polymerase Chain Reaction-Based Tests Using Endpoint Dilution, Jim Hughes, Patricia Totten Mar 2003

Estimating The Accuracy Of Polymerase Chain Reaction-Based Tests Using Endpoint Dilution, Jim Hughes, Patricia Totten

UW Biostatistics Working Paper Series

PCR-based tests for various microorganisms or target DNA sequences are generally acknowledged to be highly "sensitive" yet the concept of sensitivity is ill-defined in the literature on these tests. We propose that sensitivity should be expressed as a function of the number of target DNA molecules in the sample (or specificity when the target number is 0). However, estimating this "sensitivity curve" is problematic since it is difficult to construct samples with a fixed number of targets. Nonetheless, using serially diluted replicate aliquots of a known concentration of the target DNA sequence, we show that it is possible to disentangle …


An Empirical Study Of Marginal Structural Models For Time-Independent Treatment, Tanya A. Henneman, Mark J. Van Der Laan Oct 2002

An Empirical Study Of Marginal Structural Models For Time-Independent Treatment, Tanya A. Henneman, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In non-randomized treatment studies a significant problem for statisticians is determining how best to adjust for confounders. Marginal structural models (MSMs) and inverse probability of treatment weighted (IPTW) estimators are useful in analyzing the causal effect of treatment in observational studies. Given an IPTW estimator a doubly robust augmented IPTW (AIPTW) estimator orthogonalizes it resulting in a more e±cient estimator than the IPTW estimator. One purpose of this paper is to make a practical comparison between the IPTW estimator and the doubly robust AIPTW estimator via a series of Monte- Carlo simulations. We also consider the selection of the optimal …