Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Prediction (13)
- Causal inference (11)
- Genetics (11)
- Model selection (11)
- Bootstrap (10)
-
- Cross-validation (9)
- Adjusted p-value (8)
- Multiple testing (8)
- Type I error rate (8)
- Counterfactual (7)
- False discovery rate (7)
- Censored data (6)
- Classification (6)
- Counting process (6)
- Estimating equation (6)
- Gene expression (6)
- Loss function (6)
- Null distribution (6)
- Risk (6)
- Survival analysis (6)
- Asymptotic control (5)
- Current status data (5)
- Density estimation (5)
- Diagnostic tests (5)
- Generalized family-wise error rate (5)
- Influence curve (5)
- Longitudinal data (5)
- Microarray (5)
- Multiple hypothesis testing (5)
- Proportion of false positives (5)
- Publication Year
- Publication
-
- U.C. Berkeley Division of Biostatistics Working Paper Series (116)
- Harvard University Biostatistics Working Paper Series (73)
- UW Biostatistics Working Paper Series (55)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (43)
- The University of Michigan Department of Biostatistics Working Paper Series (24)
Articles 31 - 60 of 336
Full-Text Articles in Physical Sciences and Mathematics
Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang
Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang
Johns Hopkins University, Dept. of Biostatistics Working Papers
In disease surveillance systems or registries, bivariate survival data are typically collected under interval sampling. It refers to a situation when entry into a registry is at the time of the first failure event (e.g., HIV infection) within a calendar time interval, the time of the initiating event (e.g., birth) is retrospectively identified for all the cases in the registry, and subsequently the second failure event (e.g., death) is observed during the follow-up. Sampling bias is induced due to the selection process that the data are collected conditioning on the first failure event occurs within a time interval. Consequently, the …
Corrected Confidence Bands For Functional Data Using Principal Components, Jeff Goldsmith, Sonja Greven, Ciprian M. Crainiceanu
Corrected Confidence Bands For Functional Data Using Principal Components, Jeff Goldsmith, Sonja Greven, Ciprian M. Crainiceanu
Johns Hopkins University, Dept. of Biostatistics Working Papers
Functional principal components (FPC) analysis is widely used to decompose and express functional observations. Curve estimates implicitly condition on basis functions and other quantities derived from FPC decompositions; however these objects are unknown in practice. In this paper, we propose a method for obtaining correct curve estimates by accounting for uncertainty in FPC decompositions. Additionally, pointwise and simultaneous confidence intervals that account for both model- based and decomposition-based variability are constructed. Standard mixed-model representations of functional expansions are used to construct curve estimates and variances conditional on a specific decomposition. A bootstrap procedure is implemented to understand the uncertainty in …
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
Estimation Of A Non-Parametric Variable Importance Measure Of A Continuous Exposure, Chambaz Antoine, Pierre Neuvial, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x0 with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [van der Laan & Rubin, 2006; van der Laan & Rose, 2011]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation …
A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman
A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill
A Proof Of Bell's Inequality In Quantum Mechanics Using Causal Interactions, James M. Robins, Tyler J. Vanderweele, Richard D. Gill
COBRA Preprint Series
We give a simple proof of Bell's inequality in quantum mechanics which, in conjunction with experiments, demonstrates that the local hidden variables assumption is false. The proof sheds light on relationships between the notion of causal interaction and interference between particles.
Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei
Effectively Selecting A Target Population For A Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, L. J. Wei
Harvard University Biostatistics Working Paper Series
When comparing a new treatment with a control in a randomized clinical study, the treatment effect is generally assessed by evaluating a summary measure over a specific study population. The success of the trial heavily depends on the choice of such a population. In this paper, we show a systematic, effective way to identify a promising population, for which the new treatment is expected to have a desired benefit, using the data from a current study involving similar comparator treatments. Specifically, with the existing data we first create a parametric scoring system using multiple covariates to estimate subject-specific treatment differences. …
Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer
Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer
Harvard University Biostatistics Working Paper Series
No abstract provided.
On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei
On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard
Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard
U.C. Berkeley Division of Biostatistics Working Paper Series
We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including …
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
COBRA Preprint Series
Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …
Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler
Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler
Harvard University Biostatistics Working Paper Series
No abstract provided.
Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale
Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale
Johns Hopkins University, Dept. of Biostatistics Working Papers
Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( …
Propensity Score Analysis With Matching Weights, Liang Li
Propensity Score Analysis With Matching Weights, Liang Li
COBRA Preprint Series
The propensity score analysis is one of the most widely used methods for studying the causal treatment effect in observational studies. This paper studies treatment effect estimation with the method of matching weights. This method resembles propensity score matching but offers a number of new features including efficient estimation, rigorous variance calculation, simple asymptotics, statistical tests of balance, clearly identified target population with optimal sampling property, and no need for choosing matching algorithm and caliper size. In addition, we propose the mirror histogram as a useful tool for graphically displaying balance. The method also shares some features of the inverse …
A Broad Symmetry Criterion For Nonparametric Validity Of Parametrically-Based Tests In Randomized Trials, Russell T. Shinohara, Constantine E. Frangakis, Constantine G.. Lyketos
A Broad Symmetry Criterion For Nonparametric Validity Of Parametrically-Based Tests In Randomized Trials, Russell T. Shinohara, Constantine E. Frangakis, Constantine G.. Lyketos
Johns Hopkins University, Dept. of Biostatistics Working Papers
Summary. Pilot phases of a randomized clinical trial often suggest that a parametric model may be an accurate description of the trial's longitudinal trajectories. However, parametric models are often not used for fear that they may invalidate tests of null hypotheses of equality between the experimental groups. Existing work has shown that when, for some types of data, certain parametric models are used, the validity for testing the null is preserved even if the parametric models are incorrect. Here, we provide a broader and easier to check characterization of parametric models that can be used to (a) preserve nonparametric validity …
Estimation And Testing In Targeted Group Sequential Covariate-Adjusted Randomized Clinical Trials, Antoine Chambaz, Mark J. Van Der Laan
Estimation And Testing In Targeted Group Sequential Covariate-Adjusted Randomized Clinical Trials, Antoine Chambaz, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
This article is devoted to the construction and asymptotic study of adaptive group sequential covariate-adjusted randomized clinical trials analyzed through the prism of the semiparametric methodology of targeted maximum likelihood estimation (TMLE). We show how to build, as the data accrue group-sequentially, a sampling design which targets a user-supplied optimal design. We also show how to carry out a sound TMLE statistical inference based on such an adaptive sampling scheme (therefore extending some results known in the i.i.d setting only so far), and how group-sequential testing applies on top of it. The procedure is robust (i.e., consistent even if the …
Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan
Targeted Maximum Likelihood Estimation For Dynamic Treatment Regimes In Sequential Randomized Controlled Trials, Paul Chaffee, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Sequential Randomized Controlled Trials (SRCTs) are rapidly becoming essential tools in the search for optimized treatment regimes in ongoing treatment settings. Analyzing data for multiple time-point treatments with a view toward optimal treatment regimes is of interest in many types of afflictions: HIV infection, Attention Deficit Hyperactivity Disorder in children, leukemia, prostate cancer, renal failure, and many others. Methods for analyzing data from SRCTs exist but they are either inefficient or suffer from the drawbacks of estimating equation methodology. We describe an estimation procedure, targeted maximum likelihood estimation (TMLE), which has been fully developed and implemented in point treatment settings, …
Estimating Subject-Specific Treatment Differences For Risk-Benefit Assessment With Competing Risk Event-Time Data, Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, L. J. Wei
Estimating Subject-Specific Treatment Differences For Risk-Benefit Assessment With Competing Risk Event-Time Data, Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
Simple Examples Of Estimating Causal Effects Using Targeted Maximum Likelihood Estimation, Michael Rosenblum, Mark J. Van Der Laan
Simple Examples Of Estimating Causal Effects Using Targeted Maximum Likelihood Estimation, Michael Rosenblum, Mark J. Van Der Laan
Johns Hopkins University, Dept. of Biostatistics Working Papers
We present a brief overview of targeted maximum likelihood for estimating the causal effect of a single time point treatment and of a two time point treatment. We focus on simple examples demonstrating how to apply the methodology developed in (van der Laan and Rubin, 2006; Moore and van der Laan, 2007; van der Laan, 2010a,b). We include R code for the single time point case.
Functional Principal Components Model For High-Dimensional Brain Imaging, Vadim Zipunnikov, Brian S. Caffo, David M. Yousem, Christos Davatzikos, Brian S. Schwartz, Ciprian Crainiceanu
Functional Principal Components Model For High-Dimensional Brain Imaging, Vadim Zipunnikov, Brian S. Caffo, David M. Yousem, Christos Davatzikos, Brian S. Schwartz, Ciprian Crainiceanu
Johns Hopkins University, Dept. of Biostatistics Working Papers
We establish a fundamental equivalence between singular value decomposition (SVD) and functional principal components analysis (FPCA) models. The constructive relationship allows to deploy the numerical efficiency of SVD to fully estimate the components of FPCA, even for extremely high-dimensional functional objects, such as brain images. As an example, a functional mixed effect model is fitted to high-resolution morphometric (RAVENS) images. The main directions of morphometric variation in brain volumes are identified and discussed.
A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos
A Generalized Approach For Testing The Association Of A Set Of Predictors With An Outcome: A Gene Based Test, Benjamin A. Goldstein, Alan E. Hubbard, Lisa F. Barcellos
U.C. Berkeley Division of Biostatistics Working Paper Series
In many analyses, one has data on one level but desires to draw inference on another level. For example, in genetic association studies, one observes units of DNA referred to as SNPs, but wants to determine whether genes that are comprised of SNPs are associated with disease. While there are some available approaches for addressing this issue, they usually involve making parametric assumptions and are not easily generalizable. A statistical test is proposed for testing the association of a set of variables with an outcome of interest. No assumptions are made about the functional form relating the variables to the …
Oracle And Multiple Robustness Properties Of Survey Calibration Estimator In Missing Response Problem, Kwun Chuen Gary Chan
Oracle And Multiple Robustness Properties Of Survey Calibration Estimator In Missing Response Problem, Kwun Chuen Gary Chan
UW Biostatistics Working Paper Series
In the presence of missing response, reweighting the complete case subsample by the inverse of nonmissing probability is both intuitive and easy to implement. However, inverse probability weighting is not efficient in general and is not robust against misspecification of the missing probability model. Calibration was developed by survey statisticians for improving efficiency of inverse probability weighting estimators when population totals of auxiliary variables are known and when inclusion probability is known by design. In missing data problem we can calibrate auxiliary variables in the complete case subsample to the full sample. However, the inclusion probability is unknown in general …
Modification And Improvement Of Empirical Likelihood For Missing Response Problem, Kwun Chuen Gary Chan
Modification And Improvement Of Empirical Likelihood For Missing Response Problem, Kwun Chuen Gary Chan
UW Biostatistics Working Paper Series
An empirical likelihood (EL) estimator was proposed by Qin and Zhang (2007) for a missing response problem under a missing at random assumption. They showed by simulation studies that the finite sample performance of EL estimator is better than some existing estimators. However, the empirical likelihood estimator does not have a uniformly smaller asymptotic variance than other estimators in general. We consider several modifications to the empirical likelihood estimator and show that the proposed estimator dominates the empirical likelihood estimator and several other existing estimators in terms of asymptotic efficiencies. The proposed estimator also attains the minimum asymptotic variance among …
Modification And Improvement Of Empirical Liklihood For Missing Response Problem, Gary Chan
Modification And Improvement Of Empirical Liklihood For Missing Response Problem, Gary Chan
UW Biostatistics Working Paper Series
An empirical likelihood (EL) estimator was proposed by Qin and Zhang (2007) for a missing response problem under a missing at random assumption. They showed by simulation studies that the finite sample performance of EL estimator is better than some existing estimators. However, the empirical likelihood estimator does not have a uniformly smaller asymptotic variance than other estimators in general. We consider several modifications to the empirical likelihood estimator and show that the proposed estimator dominates the empirical likelihood estimator and several other existing estimators in terms of asymptotic efficiencies. The proposed estimator also attains the minimum asymptotic variance among …
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
Minimum Description Length Measures Of Evidence For Enrichment, Zhenyu Yang, David R. Bickel
COBRA Preprint Series
In order to functionally interpret differentially expressed genes or other discovered features, researchers seek to detect enrichment in the form of overrepresentation of discovered features associated with a biological process. Most enrichment methods treat the p-value as the measure of evidence using a statistical test such as the binomial test, Fisher's exact test or the hypergeometric test. However, the p-value is not interpretable as a measure of evidence apart from adjustments in light of the sample size. As a measure of evidence supporting one hypothesis over the other, the Bayes factor (BF) overcomes this drawback of the p-value but lacks …
Efficient Measurement Error Correction With Spatially Misaligned Data, Adam A. Szpiro, Lianne Sheppard, Thomas Lumley
Efficient Measurement Error Correction With Spatially Misaligned Data, Adam A. Szpiro, Lianne Sheppard, Thomas Lumley
UW Biostatistics Working Paper Series
Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem …
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel
COBRA Preprint Series
The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.
We …
Improving The Power Of Chronic Disease Surveillance By Incorporating Residential History, Justin Manjourides, Marcello Pagano
Improving The Power Of Chronic Disease Surveillance By Incorporating Residential History, Justin Manjourides, Marcello Pagano
Harvard University Biostatistics Working Paper Series
No abstract provided.
Multilevel Functional Principal Component Analysis For High-Dimensional Data, Vadim Zipunnikov, Brian Caffo, Ciprian Crainiceanu, David M. Yousem, Christos Davatzikos, Brian S. Schwartz
Multilevel Functional Principal Component Analysis For High-Dimensional Data, Vadim Zipunnikov, Brian Caffo, Ciprian Crainiceanu, David M. Yousem, Christos Davatzikos, Brian S. Schwartz
Johns Hopkins University, Dept. of Biostatistics Working Papers
We propose fast and scalable statistical methods for the analysis of hundreds or thousands of high dimensional vectors observed at multiple visits. The proposed inferential methods avoid the difficult task of loading the entire data set at once in the computer memory and use sequential access to data. This allows deployment of our methodology on low-resource computers where computations can be done in minutes on extremely large data sets. Our methods are motivated by and applied to a study where hundreds of subjects were scanned using Magnetic Resonance Imaging (MRI) at two visits roughly five years apart. The original data …
Landmark Prediction Of Survival, Layla Parast, Tianxi Cai
Landmark Prediction Of Survival, Layla Parast, Tianxi Cai
Harvard University Biostatistics Working Paper Series
No abstract provided.
Longitudinal Penalized Functional Regression, Jeff Goldsmith, Ciprian M. Crainiceanu, Brian Caffo, Daniel Reich
Longitudinal Penalized Functional Regression, Jeff Goldsmith, Ciprian M. Crainiceanu, Brian Caffo, Daniel Reich
Johns Hopkins University, Dept. of Biostatistics Working Papers
We propose a new regression model and inferential tools for the case when both the outcome and the functional exposures are observed at multiple visits. This data structure is new but increasingly present in applications where functions or images are recorded at multiple times. This raises new inferential challenges that cannot be addressed with current methods and software. Our proposed model generalizes the Generalized Linear Mixed Effects Model (GLMM) by adding functional predictors. Smoothness of the functional coefficients is ensured using roughness penalties estimated by Restricted Maximum Likelihood (REML) in a corresponding mixed effects model. This method is computationally feasible …