Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 60 of 336

Full-Text Articles in Physical Sciences and Mathematics

Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang Nov 2011

Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang

Johns Hopkins University, Dept. of Biostatistics Working Papers

In disease surveillance systems or registries, bivariate survival data are typically collected under interval sampling. It refers to a situation when entry into a registry is at the time of the first failure event (e.g., HIV infection) within a calendar time interval, the time of the initiating event (e.g., birth) is retrospectively identified for all the cases in the registry, and subsequently the second failure event (e.g., death) is observed during the follow-up. Sampling bias is induced due to the selection process that the data are collected conditioning on the first failure event occurs within a time interval. Consequently, the …


Corrected Confidence Bands For Functional Data Using Principal Components, Jeff Goldsmith, Sonja Greven, Ciprian M. Crainiceanu Nov 2011

Corrected Confidence Bands For Functional Data Using Principal Components, Jeff Goldsmith, Sonja Greven, Ciprian M. Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

Functional principal components (FPC) analysis is widely used to decompose and express functional observations. Curve estimates implicitly condition on basis functions and other quantities derived from FPC decompositions; however these objects are unknown in practice. In this paper, we propose a method for obtaining correct curve estimates by accounting for uncertainty in FPC decompositions. Additionally, pointwise and simultaneous confidence intervals that account for both model- based and decomposition-based variability are constructed. Standard mixed-model representations of functional expansions are used to construct curve estimates and variances conditional on a specific decomposition. A bootstrap procedure is implemented to understand the uncertainty in …


Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan Oct 2011

Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x0 with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [van der Laan & Rubin, 2006; van der Laan & Rose, 2011]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation …


A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman Sep 2011

A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill Sep 2011

A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill

COBRA Preprint Series

We give a simple proof of Bell's inequality in quantum mechanics which, in conjunction with experiments, demonstrates that the local hidden variables assumption is false. The proof sheds light on relationships between the notion of causal interaction and interference between particles.


Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei Aug 2011

Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei

Harvard University Biostatistics Working Paper Series

When comparing a new treatment with a control in a randomized clinical study, the treatment effect is generally assessed by evaluating a summary measure over a specific study population. The success of the trial heavily depends on the choice of such a population. In this paper, we show a systematic, effective way to identify a promising population, for which the new treatment is expected to have a desired benefit, using the data from a current study involving similar comparator treatments. Specifically, with the existing data we first create a parametric scoring system using multiple covariates to estimate subject-specific treatment differences. …


Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer Aug 2011

Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer

Harvard University Biostatistics Working Paper Series

No abstract provided.


On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei Jul 2011

On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard Jul 2011

Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including …


A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi Jul 2011

A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi

COBRA Preprint Series

Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …


Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler Jul 2011

Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler

Harvard University Biostatistics Working Paper Series

No abstract provided.


Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale Jun 2011

Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale

Johns Hopkins University, Dept. of Biostatistics Working Papers

Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( …


Propensity Score Analysis With Matching Weights, Liang Li May 2011

Propensity Score Analysis With Matching Weights, Liang Li

COBRA Preprint Series

The propensity score analysis is one of the most widely used methods for studying the causal treatment effect in observational studies. This paper studies treatment effect estimation with the method of matching weights. This method resembles propensity score matching but offers a number of new features including efficient estimation, rigorous variance calculation, simple asymptotics, statistical tests of balance, clearly identified target population with optimal sampling property, and no need for choosing matching algorithm and caliper size. In addition, we propose the mirror histogram as a useful tool for graphically displaying balance. The method also shares some features of the inverse …


A Broad Symmetry Criterion For Nonparametric Validity Of Parametrically-Based Tests In Randomized Trials, Russell T. Shinohara, Constantine E. Frangakis, Constantine G.. Lyketos Apr 2011

A Broad Symmetry Criterion For Nonparametric Validity Of Parametrically-Based Tests In Randomized Trials, Russell T. Shinohara, Constantine E. Frangakis, Constantine G.. Lyketos

Johns Hopkins University, Dept. of Biostatistics Working Papers

Summary. Pilot phases of a randomized clinical trial often suggest that a parametric model may be an accurate description of the trial's longitudinal trajectories. However, parametric models are often not used for fear that they may invalidate tests of null hypotheses of equality between the experimental groups. Existing work has shown that when, for some types of data, certain parametric models are used, the validity for testing the null is preserved even if the parametric models are incorrect. Here, we provide a broader and easier to check characterization of parametric models that can be used to (a) preserve nonparametric validity …


Estimation And Testing In Targeted Group Sequential Covariate-Adjusted Randomized Clinical Trials, Antoine Chambaz, Mark J. Van Der Laan Apr 2011

Estimation And Testing In Targeted Group Sequential Covariate-Adjusted Randomized Clinical Trials, Antoine Chambaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

This article is devoted to the construction and asymptotic study of adaptive group sequential covariate-adjusted randomized clinical trials analyzed through the prism of the semiparametric methodology of targeted maximum likelihood estimation (TMLE). We show how to build, as the data accrue group-sequentially, a sampling design which targets a user-supplied optimal design. We also show how to carry out a sound TMLE statistical inference based on such an adaptive sampling scheme (therefore extending some results known in the i.i.d setting only so far), and how group-sequential testing applies on top of it. The procedure is robust (i.e., consistent even if the …


Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan Mar 2011

Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Sequential Randomized Controlled Trials (SRCTs) are rapidly becoming essential tools in the search for optimized treatment regimes in ongoing treatment settings. Analyzing data for multiple time-point treatments with a view toward optimal treatment regimes is of interest in many types of afflictions: HIV infection, Attention Deficit Hyperactivity Disorder in children, leukemia, prostate cancer, renal failure, and many others. Methods for analyzing data from SRCTs exist but they are either inefficient or suffer from the drawbacks of estimating equation methodology. We describe an estimation procedure, targeted maximum likelihood estimation (TMLE), which has been fully developed and implemented in point treatment settings, …


Estimating Subject-Specific Treatment Differences For Risk-Benefit Assessment With Competing Risk Event-Time Data, Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, L. J. Wei Mar 2011

Estimating Subject-Specific Treatment Differences For Risk-Benefit Assessment With Competing Risk Event-Time Data, Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Simple Examples Of Estimating Causal Effects Using Targeted Maximum Likelihood Estimation, Michael Rosenblum, Mark J. Van Der Laan Mar 2011

Simple Examples Of Estimating Causal Effects Using Targeted Maximum Likelihood Estimation, Michael Rosenblum, Mark J. Van Der Laan

Johns Hopkins University, Dept. of Biostatistics Working Papers

We present a brief overview of targeted maximum likelihood for estimating the causal effect of a single time point treatment and of a two time point treatment. We focus on simple examples demonstrating how to apply the methodology developed in (van der Laan and Rubin, 2006; Moore and van der Laan, 2007; van der Laan, 2010a,b). We include R code for the single time point case.


Functional Principal Components Model For High-Dimensional Brain Imaging, Vadim Zipunnikov, Brian S. Caffo, David M. Yousem, Christos Davatzikos, Brian S. Schwartz, Ciprian Crainiceanu Jan 2011

Functional Principal Components Model For High-Dimensional Brain Imaging, Vadim Zipunnikov, Brian S. Caffo, David M. Yousem, Christos Davatzikos, Brian S. Schwartz, Ciprian Crainiceanu

Johns Hopkins University, Dept. of Biostatistics Working Papers

We establish a fundamental equivalence between singular value decomposition (SVD) and functional principal components analysis (FPCA) models. The constructive relationship allows to deploy the numerical efficiency of SVD to fully estimate the components of FPCA, even for extremely high-dimensional functional objects, such as brain images. As an example, a functional mixed effect model is fitted to high-resolution morphometric (RAVENS) images. The main directions of morphometric variation in brain volumes are identified and discussed.


A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos Jan 2011

A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos

U.C. Berkeley Division of Biostatistics Working Paper Series

In many analyses, one has data on one level but desires to draw inference on another level. For example, in genetic association studies, one observes units of DNA referred to as SNPs, but wants to determine whether genes that are comprised of SNPs are associated with disease. While there are some available approaches for addressing this issue, they usually involve making parametric assumptions and are not easily generalizable. A statistical test is proposed for testing the association of a set of variables with an outcome of interest. No assumptions are made about the functional form relating the variables to the …


Oracle And Multiple Robustness Properties Of Survey Calibration Estimator In Missing Response Problem, Kwun Chuen Gary Chan Dec 2010

Oracle And Multiple Robustness Properties Of Survey Calibration Estimator In Missing Response Problem, Kwun Chuen Gary Chan

UW Biostatistics Working Paper Series

In the presence of missing response, reweighting the complete case subsample by the inverse of nonmissing probability is both intuitive and easy to implement. However, inverse probability weighting is not efficient in general and is not robust against misspecification of the missing probability model. Calibration was developed by survey statisticians for improving efficiency of inverse probability weighting estimators when population totals of auxiliary variables are known and when inclusion probability is known by design. In missing data problem we can calibrate auxiliary variables in the complete case subsample to the full sample. However, the inclusion probability is unknown in general …


Modification And Improvement Of Empirical Likelihood For Missing Response Problem, Kwun Chuen Gary Chan Dec 2010

Modification And Improvement Of Empirical Likelihood For Missing Response Problem, Kwun Chuen Gary Chan

UW Biostatistics Working Paper Series

An empirical likelihood (EL) estimator was proposed by Qin and Zhang (2007) for a missing response problem under a missing at random assumption. They showed by simulation studies that the finite sample performance of EL estimator is better than some existing estimators. However, the empirical likelihood estimator does not have a uniformly smaller asymptotic variance than other estimators in general. We consider several modifications to the empirical likelihood estimator and show that the proposed estimator dominates the empirical likelihood estimator and several other existing estimators in terms of asymptotic efficiencies. The proposed estimator also attains the minimum asymptotic variance among …


Modification And Improvement Of Empirical Liklihood For Missing Response Problem, Gary Chan Dec 2010

Modification And Improvement Of Empirical Liklihood For Missing Response Problem, Gary Chan

UW Biostatistics Working Paper Series

An empirical likelihood (EL) estimator was proposed by Qin and Zhang (2007) for a missing response problem under a missing at random assumption. They showed by simulation studies that the finite sample performance of EL estimator is better than some existing estimators. However, the empirical likelihood estimator does not have a uniformly smaller asymptotic variance than other estimators in general. We consider several modifications to the empirical likelihood estimator and show that the proposed estimator dominates the empirical likelihood estimator and several other existing estimators in terms of asymptotic efficiencies. The proposed estimator also attains the minimum asymptotic variance among …


Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel Dec 2010

Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel

COBRA Preprint Series

In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …


Efficient Measurement Error Correction With Spatially Misaligned Data, Adam A. Szpiro, Lianne Sheppard, Thomas Lumley Dec 2010

Efficient Measurement Error Correction With Spatially Misaligned Data, Adam A. Szpiro, Lianne Sheppard, Thomas Lumley

UW Biostatistics Working Paper Series

Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem …


Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel Nov 2010

Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel

COBRA Preprint Series

The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.

We …


Improving The Power Of Chronic Disease Surveillance By Incorporating Residential History, Justin Manjourides, Marcello Pagano Nov 2010

Improving The Power Of Chronic Disease Surveillance By Incorporating Residential History, Justin Manjourides, Marcello Pagano

Harvard University Biostatistics Working Paper Series

No abstract provided.


Multilevel Functional Principal Component Analysis For High-Dimensional Data, Vadim Zipunnikov, Brian Caffo, Ciprian Crainiceanu, David M. Yousem, Christos Davatzikos, Brian S. Schwartz Oct 2010

Multilevel Functional Principal Component Analysis For High-Dimensional Data, Vadim Zipunnikov, Brian Caffo, Ciprian Crainiceanu, David M. Yousem, Christos Davatzikos, Brian S. Schwartz

Johns Hopkins University, Dept. of Biostatistics Working Papers

We propose fast and scalable statistical methods for the analysis of hundreds or thousands of high dimensional vectors observed at multiple visits. The proposed inferential methods avoid the difficult task of loading the entire data set at once in the computer memory and use sequential access to data. This allows deployment of our methodology on low-resource computers where computations can be done in minutes on extremely large data sets. Our methods are motivated by and applied to a study where hundreds of subjects were scanned using Magnetic Resonance Imaging (MRI) at two visits roughly five years apart. The original data …


Landmark Prediction Of Survival, Layla Parast, Tianxi Cai Sep 2010

Landmark Prediction Of Survival, Layla Parast, Tianxi Cai

Harvard University Biostatistics Working Paper Series

No abstract provided.


Longitudinal Penalized Functional Regression, Jeff Goldsmith, Ciprian M. Crainiceanu, Brian Caffo, Daniel Reich Sep 2010

Longitudinal Penalized Functional Regression, Jeff Goldsmith, Ciprian M. Crainiceanu, Brian Caffo, Daniel Reich

Johns Hopkins University, Dept. of Biostatistics Working Papers

We propose a new regression model and inferential tools for the case when both the outcome and the functional exposures are observed at multiple visits. This data structure is new but increasingly present in applications where functions or images are recorded at multiple times. This raises new inferential challenges that cannot be addressed with current methods and software. Our proposed model generalizes the Generalized Linear Mixed Effects Model (GLMM) by adding functional predictors. Smoothness of the functional coefficients is ensured using roughness penalties estimated by Restricted Maximum Likelihood (REML) in a corresponding mixed effects model. This method is computationally feasible …