Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

COBRA

2011

Discipline
Keyword
Publication

Articles 1 - 30 of 65

Full-Text Articles in Physical Sciences and Mathematics

Identification And Efficient Estimation Of The Natural Direct Effect Among The Untreated, Samuel D. Lendle, Mark J. Van Der Laan Dec 2011

Identification And Efficient Estimation Of The Natural Direct Effect Among The Untreated, Samuel D. Lendle, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The natural direct effect (NDE), or the effect of an exposure on an outcome if an intermediate variable was set to the level it would have been in the absence of the exposure, is often of interest to investigators. In general, the statistical parameter associated with the NDE is difficult to estimate in the non-parametric model, particularly when the intermediate variable is continuous or high dimensional. In this paper we introduce a new causal parameter called the natural direct effect among the untreated, discus identifiability assumptions, and show that this new parameter is equivalent to the NDE in a randomized …


Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng Dec 2011

Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Toxicity Profiling Of Engineered Nanomaterials Via Multivariate Dose Response Surface Modeling, Trina Patel, Donatello Telesca, Saji George, Andre Nel Dec 2011

Toxicity Profiling Of Engineered Nanomaterials Via Multivariate Dose Response Surface Modeling, Trina Patel, Donatello Telesca, Saji George, Andre Nel

COBRA Preprint Series

New generation in-vitro high throughput screening (HTS) assays for the assessment of engineered nanomaterials provide an opportunity to learn how these particles interact at the cellular level, particularly in relation to injury pathways. These types of assays are often characterized by small sample sizes, high measurement error and high dimensionality as multiple cytotoxicity outcomes are measured across an array of doses and durations of exposure. In this article we propose a probability model for toxicity profiling of engineered nanomaterials. A hierarchical framework is used to account for the multivariate nature of the data by modeling dependence between outcomes and thereby …


Modeling Criminal Careers As Departures From A Unimodal Population Age-Crime Curve: The Case Of Marijuana Use, Donatello Telesca, Elena Erosheva, Derek Kreager, Ross Matsueda Dec 2011

Modeling Criminal Careers As Departures From A Unimodal Population Age-Crime Curve: The Case Of Marijuana Use, Donatello Telesca, Elena Erosheva, Derek Kreager, Ross Matsueda

COBRA Preprint Series

A major aim of longitudinal analyses of life course data is to describe the within- and between-individual variability in a behavioral outcome, such as crime. Statistical analyses of such data typically draw on mixture and mixed-effects growth models. In this work, we present a functional analytic point of view and develop an alternative method that models individual crime trajectories as departures from a population age-crime curve. Drawing on empirical and theoretical claims in criminology, we assume a unimodal population age-crime curve and allow individual expected crime trajectories to differ by their levels of offending and patterns of temporal misalignment. We …


Longitudinal High-Dimensional Data Analysis, Vadim Zipunnikov, Sonja Greven, Brian Caffo, Daniel S. Reich, Ciprian Crainiceanu Nov 2011

Longitudinal High-Dimensional Data Analysis, Vadim Zipunnikov, Sonja Greven, Brian Caffo, Daniel S. Reich, Ciprian Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

We develop a flexible framework for modeling high-dimensional functional and imaging data observed longitudinally. The approach decomposes the observed variability of high-dimensional observations measured at multiple visits into three additive components: a subject-specific functional random intercept that quantifies the cross-sectional variability, a subject-specific functional slope that quantifies the dynamic irreversible deformation over multiple visits, and a subject-visit specific functional deviation that quantifies exchangeable or reversible visit-to-visit changes. The proposed method is very fast, scalable to studies including ultra-high dimensional data, and can easily be adapted to and executed on modest computing infrastructures. The method is applied to the longitudinal analysis …


Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang Nov 2011

Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang

Johns Hopkins University, Dept. of Biostatistics Working Papers

In disease surveillance systems or registries, bivariate survival data are typically collected under interval sampling. It refers to a situation when entry into a registry is at the time of the first failure event (e.g., HIV infection) within a calendar time interval, the time of the initiating event (e.g., birth) is retrospectively identified for all the cases in the registry, and subsequently the second failure event (e.g., death) is observed during the follow-up. Sampling bias is induced due to the selection process that the data are collected conditioning on the first failure event occurs within a time interval. Consequently, the …


Likelihood Based Population Independent Component Analysis, Ani Eloyan, Ciprian M. Crainiceanu, Brian S. Caffo Nov 2011

Likelihood Based Population Independent Component Analysis, Ani Eloyan, Ciprian M. Crainiceanu, Brian S. Caffo

Johns Hopkins University, Dept. of Biostatistics Working Papers

Independent component analysis (ICA) is a widely used technique for blind source separation, used heavily in several scientific research areas including acoustics, electrophysiology and functional neuroimaging. We propose a scalable two-stage iterative true group ICA methodology for analyzing population level fMRI data where the number of subjects is very large. The method is based on likelihood estimators of the underlying source densities and the mixing matrix. As opposed to many commonly used group ICA algorithms the proposed method does not require significant data reduction by a twofold singular value decomposition. In addition, the method can be applied to a large …


Corrected Confidence Bands For Functional Data Using Principal Components, Jeff Goldsmith, Sonja Greven, Ciprian M. Crainiceanu Nov 2011

Corrected Confidence Bands For Functional Data Using Principal Components, Jeff Goldsmith, Sonja Greven, Ciprian M. Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

Functional principal components (FPC) analysis is widely used to decompose and express functional observations. Curve estimates implicitly condition on basis functions and other quantities derived from FPC decompositions; however these objects are unknown in practice. In this paper, we propose a method for obtaining correct curve estimates by accounting for uncertainty in FPC decompositions. Additionally, pointwise and simultaneous confidence intervals that account for both model- based and decomposition-based variability are constructed. Standard mixed-model representations of functional expansions are used to construct curve estimates and variances conditional on a specific decomposition. A bootstrap procedure is implemented to understand the uncertainty in …


Proxy Pattern-Mixture Analysis For A Binary Variable Subject To Nonresponse., Rebecca H. Andridge, Roderick J. Little Nov 2011

Proxy Pattern-Mixture Analysis For A Binary Variable Subject To Nonresponse., Rebecca H. Andridge, Roderick J. Little

The University of Michigan Department of Biostatistics Working Paper Series

We consider assessment of the impact of nonresponse for a binary survey

variable Y subject to nonresponse, when there is a set of covariates

observed for nonrespondents and respondents. To reduce dimensionality and

for simplicity we reduce the covariates to a continuous proxy variable X

that has the highest correlation with Y, estimated from a probit

regression analysis of respondent data. We extend our previously proposed

proxy-pattern mixture analysis (PPMA) for continuous outcomes to the binary

outcome using a latent variable approach. The method does not assume data

are missing at random, and creates a framework for sensitivity analyses.

Maximum …


Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan Oct 2011

Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x0 with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [van der Laan & Rubin, 2006; van der Laan & Rose, 2011]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation …


Building A Nomogram For Survey-Weighted Cox Models Using R, Marinela Capanu, Mithat Gonen Oct 2011

Building A Nomogram For Survey-Weighted Cox Models Using R, Marinela Capanu, Mithat Gonen

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Nomograms have become a very useful tool among clinicians as they provide individualized predictions based on the characteristics of the patient. For complex design survey data with survival outcome, Binder (1992) proposed methods for fitting survey-weighted Cox models, but to the best of our knowledge there is no available software to build a nomogram based on such models. This paper introduces R software to accomplish this goal and illustrates its use on a gastric cancer dataset. Validation and calibration routines are also included.


Bland-Altman Plots For Evaluating Agreement Between Solid Tumor Measurements, Chaya S. Moskowitz, Mithat Gonen Sep 2011

Bland-Altman Plots For Evaluating Agreement Between Solid Tumor Measurements, Chaya S. Moskowitz, Mithat Gonen

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Rationale and Objectives. Solid tumor measurements are regularly used in clinical trials of anticancer therapeutic agents and in clinical practice managing patients' care. Consequently studies evaluating the reproducibility of solid tumor measurements are important as lack of reproducibility may directly affect patient management. The authors propose utilizing a modified Bland-Altman plot with a difference metric that lends itself naturally to this situation and facilitates interpretation. Materials and Methods. The modification to the Bland-Altman plot involves replacing the difference plotted on the vertical axis with the relative percent change (RC) between the two measurements. This quantity is the same one used …


A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman Sep 2011

A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Hybrid Bayesian Laplacian Approach For Generalized Linear Mixed Models, Marinela Capanu, Mithat Gonen, Colin B. Begg Sep 2011

A Hybrid Bayesian Laplacian Approach For Generalized Linear Mixed Models, Marinela Capanu, Mithat Gonen, Colin B. Begg

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

The analytical intractability of generalized linear mixed models (GLMMs) has generated a lot of research in the past two decades. Applied statisticians routinely face the frustrating prospect of widely disparate results produced by the methods that are currently implemented in commercially available software. This article is motivated by this frustration and develops guidance as well as new methods that are computationally efficient and statistically reliable. Two main classes of approximations have been developed: likelihood-based methods and Bayesian methods. Likelihood-based methods such as the penalized quasi-likelihood approach of Breslow and Clayton (1993) have been shown to produce biased estimates especially for …


A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill Sep 2011

A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill

COBRA Preprint Series

We give a simple proof of Bell's inequality in quantum mechanics which, in conjunction with experiments, demonstrates that the local hidden variables assumption is false. The proof sheds light on relationships between the notion of causal interaction and interference between particles.


Longitudinal Analysis Of Spatiotemporal Processes: A Case Study Of Dynamic Contrast-Enhanced Magnetic Resonance Imaging In Multiple Sclerosis, Russell T. Shinohara, Ciprian M. Crainiceanu, Brian S. Caffo, Daniel S. Reich Sep 2011

Longitudinal Analysis Of Spatiotemporal Processes: A Case Study Of Dynamic Contrast-Enhanced Magnetic Resonance Imaging In Multiple Sclerosis, Russell T. Shinohara, Ciprian M. Crainiceanu, Brian S. Caffo, Daniel S. Reich

Johns Hopkins University, Dept. of Biostatistics Working Papers

Multiple sclerosis (MS) is an immune-mediated disease in which inflammatory lesions form in the brain. In many active MS lesions, the blood-brain barrier (BBB) is disrupted and blood flows into white matter; this disruption may be related to morbidity and disability. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) allows quantitative study of blood flow and permeability dynamics throughout the brain. This technique involves a subject being imaged sequentially during a study visit as an intravenously administered contrast agent flows into the brain. In regions where flow is abnormal, such as white matter lesions, this allows the quantification of the BBB damage. …


Movelets: A Dictionary Of Movement, Jiawei Bai, Jeff Goldsmith, Brian Caffo, Thomas A. Glass, Ciprian M. Crainiceanu Aug 2011

Movelets: A Dictionary Of Movement, Jiawei Bai, Jeff Goldsmith, Brian Caffo, Thomas A. Glass, Ciprian M. Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

Recent technological advances provide researchers a way of gathering real-time information on an individual’s movement through the use of wearable devices that record acceleration. In this paper, we propose a method for identifying activity types, like walking, standing, and resting, from acceleration data. Our approach decomposes movements into short components called “movelets”, and builds a reference for each activity type. Unknown activities are predicted by matching new movelets to the reference. We apply our method to data collected from a single, three-axis accelerometer and focus on activities of interest in studying physical function in elderly populations. An important technical advantage …


Some Observations On The Wilcoxon Rank Sum Test, Scott S. Emerson Aug 2011

Some Observations On The Wilcoxon Rank Sum Test, Scott S. Emerson

UW Biostatistics Working Paper Series

This manuscript presents some general comments about the Wilcoxon rank sum test. Even the most casual reader will gather that I am not too impressed with the scientific usefulness of the Wilcoxon test. However, the actual motivation is more to illustrate differences between parametric, semiparametric, and nonparametric (distribution-free) inference, and to use this example to illustrate how many misconceptions have been propagated through a focus on (semi)parametric probability models as the basis for evaluating commonly used statistical analysis models. The document itself arose as a teaching tool for courses aimed at graduate students in biostatistics and statistics, with parts of …


The Importance Of Statistical Theory In Outlier Detection, Sarah C. Emerson, Scott S. Emerson Aug 2011

The Importance Of Statistical Theory In Outlier Detection, Sarah C. Emerson, Scott S. Emerson

UW Biostatistics Working Paper Series

We explore the performance of the outlier-sum statistic (Tibshirani and Hastie, Biostatistics 2007 8:2--8), a proposed method for identifying genes for which only a subset of a group of samples or patients exhibits differential expression levels. Our discussion focuses on this method as an example of how inattention to standard statistical theory can lead to approaches that exhibit some serious drawbacks. In contrast to the results presented by those authors, when comparing this method to several variations of the $t$-test, we find that the proposed method offers little benefit even in the most idealized scenarios, and suffers from a number …


Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei Aug 2011

Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei

Harvard University Biostatistics Working Paper Series

When comparing a new treatment with a control in a randomized clinical study, the treatment effect is generally assessed by evaluating a summary measure over a specific study population. The success of the trial heavily depends on the choice of such a population. In this paper, we show a systematic, effective way to identify a promising population, for which the new treatment is expected to have a desired benefit, using the data from a current study involving similar comparator treatments. Specifically, with the existing data we first create a parametric scoring system using multiple covariates to estimate subject-specific treatment differences. …


Targeted Minimum Loss Based Estimation Of An Intervention Specific Mean Outcome, Mark J. Van Der Laan, Susan Gruber Aug 2011

Targeted Minimum Loss Based Estimation Of An Intervention Specific Mean Outcome, Mark J. Van Der Laan, Susan Gruber

U.C. Berkeley Division of Biostatistics Working Paper Series

Targeted minimum loss based estimation (TMLE) provides a template for the construction of semiparametric locally efficient double robust substitution estimators of the target parameter of the data generating distribution in a semiparametric censored data or causal inference model based on a sample of independent and identically distributed copies from this data generating distribution (van der Laan and Rubin (2006), van der Laan (2008), van der Laan and Rose (2011)). TMLE requires 1) writing the target parameter as a particular mapping from a typically infinite dimensional parameter of the probability distribution of the unit data structure into the parameter space, 2) …


Population Intervention Causal Effects Based On Stochastic Interventions, Ivan Diaz Munoz, Mark J. Van Der Laan Aug 2011

Population Intervention Causal Effects Based On Stochastic Interventions, Ivan Diaz Munoz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Estimating the causal effect of an intervention on a population typically involves defining parameters in a nonparametric structural equation model (Pearl, 2000, NPSEM) in which the treatment or exposure is deter- ministically assigned in a static or dynamic way. We define a new causal parameter that takes into account the fact that intervention policies can result in stochastically assigned exposures. The statistical parameter that identifies the causal parameter of interest is established. Inverse probability of treatment weighting (IPTW), augmented IPTW (A-IPTW), and targeted maximum likelihood estimators (TMLE) are developed. A simulation study is performed to demonstrate the properties of these …


Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer Aug 2011

Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer

Harvard University Biostatistics Working Paper Series

No abstract provided.


Targeted Maximum Likelihood Estimation Of Natural Direct Effect, Wenjing Zheng, Mark J. Van Der Laan Jul 2011

Targeted Maximum Likelihood Estimation Of Natural Direct Effect, Wenjing Zheng, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In many causal inference problems, one is interested in the direct causal effect of an exposure on an outcome of interest that is not mediated by certain intermediate variables. Robins and Greenland (1992) and Pearl (2000) formalized the definition of two types of direct effects (natural and controlled) under the counterfactual framework. Since then, identifiability conditions for these effects have been studied extensively. By contrast, considerably fewer efforts have been invested in the estimation problem of the natural direct effect. In this article, we propose a semiparametric efficient, multiply robust estimator for the natural direct effect of a binary treatment …


On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei Jul 2011

On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Targeted Minimum Loss Based Estimation Based On Directly Solving The Efficient Influence Curve Equation, Paul Chaffee, Mark J. Van Der Laan Jul 2011

Targeted Minimum Loss Based Estimation Based On Directly Solving The Efficient Influence Curve Equation, Paul Chaffee, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Applying targeted maximum likelihood estimation to longitudinal data can be computationally intensive. As the number of time points and/or number of intermediate factors grows, the computation resources consumed by these algorithms likewise increases. Different TMLE algorithms have different computational speeds and implementation challenges; there may also be efficiency differences of the corresponding estimators. The algorithm we describe here proceeds by solving the empirical efficient influence curve equation directly using numerical computation methods, rather than indirectly (by solving a score equation), which is the usual route. We believe that this estimator is the simplest of the TMLE procedures to implement in …


Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard Jul 2011

Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including …


Reduced Bayesian Hierarchical Models: Estimating Health Effects Of Simultaneous Exposure To Multiple Pollutants, Jennifer F. Bobb, Francesca Dominici, Roger D. Peng Jul 2011

Reduced Bayesian Hierarchical Models: Estimating Health Effects Of Simultaneous Exposure To Multiple Pollutants, Jennifer F. Bobb, Francesca Dominici, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

Quantifying the health effects associated with simultaneous exposure to many air pollutants is now a research priority of the US EPA. Bayesian hierarchical models (BHM) have been extensively used in multisite time series studies of air pollution and health to estimate health effects of a single pollutant adjusted for potential confounding of other pollutants and other time-varying factors. However, when the scientific goal is to estimate the impacts of many pollutants jointly, a straightforward application of BHM is challenged by the need to specify a random-effect distribution on a high-dimensional vector of nuisance parameters, which often do not have an …


A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi Jul 2011

A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi

COBRA Preprint Series

Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …


Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler Jul 2011

Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler

Harvard University Biostatistics Working Paper Series

No abstract provided.