Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

2009

Statistical Theory

Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 32

Full-Text Articles in Statistical Methodology

Pragmatic Estimation Of A Spatio-Temporal Air Quality Model With Irregular Monitoring Data, Paul D. Sampson, Adam A. Szpiro, Lianne Sheppard, Johan Lindström, Joel D. Kaufman Nov 2009

Pragmatic Estimation Of A Spatio-Temporal Air Quality Model With Irregular Monitoring Data, Paul D. Sampson, Adam A. Szpiro, Lianne Sheppard, Johan Lindström, Joel D. Kaufman

UW Biostatistics Working Paper Series

Statistical analyses of the health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in “land-use” regression models. More recently these regression models have accounted for spatial correlation structure in combining monitoring data with land-use covariates. The current paper builds on these concepts to address spatio-temporal prediction of ambient concentrations of particulate matter with aerodynamic diameter less than 2.5 μm (PM2.5) on the basis of a model representing spatially varying seasonal trends and spatial correlation structures. Our hierarchical methodology provides a pragmatic approach that fully exploits regulatory and other supplemental monitoring data which jointly …


On The Behaviour Of Marginal And Conditional Akaike Information Criteria In Linear Mixed Models, Sonja Greven, Thomas Kneib Nov 2009

On The Behaviour Of Marginal And Conditional Akaike Information Criteria In Linear Mixed Models, Sonja Greven, Thomas Kneib

Johns Hopkins University, Dept. of Biostatistics Working Papers

In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion (AIC) have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is no longer an asymptotically unbiased estimator of the Akaike information, and in fact favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that leads to the selection of any random effect not predicted to be exactly zero. We derive …


Survival Analysis With Error-Prone Time-Varying Covariates: A Risk Set Calibration Approach, Xiaomei Liao, David M. Zucker, Yi Li, Donna Spiegelman Nov 2009

Survival Analysis With Error-Prone Time-Varying Covariates: A Risk Set Calibration Approach, Xiaomei Liao, David M. Zucker, Yi Li, Donna Spiegelman

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles Nov 2009

A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles

Sunduz Keles

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data.

We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and …


A New Class Of Minimum Power Divergence Estimators With Applications To Cancer Surveillance, Nirian Martin, Yi Li Nov 2009

A New Class Of Minimum Power Divergence Estimators With Applications To Cancer Surveillance, Nirian Martin, Yi Li

Harvard University Biostatistics Working Paper Series

No abstract provided.


Quasi-Least Squares With Mixed Linear Correlation Structures, Jichun Xie, Justine Shults, Jon Peet, Dwight Stambolian, Mary F. Cotch Oct 2009

Quasi-Least Squares With Mixed Linear Correlation Structures, Jichun Xie, Justine Shults, Jon Peet, Dwight Stambolian, Mary F. Cotch

UPenn Biostatistics Working Papers

Quasi-least squares (QLS) is a two-stage computational approach for estimation of the correlation parameters in the framework of generalized estimating equations (GEE). We prove two general results for the class of mixed linear correlation structures: namely, that the stage one QLS estimate of the correlation parameter always exists and is feasible (yields a positive definite estimated correlation matrix) for any correlation structure, while the stage two estimator exists and is unique (and therefore consistent) with probability one, for the class of mixed linear correlation structures. Our general results justify the implementation of QLS for particular members of the class of …


Readings In Targeted Maximum Likelihood Estimation, Mark J. Van Der Laan, Sherri Rose, Susan Gruber Sep 2009

Readings In Targeted Maximum Likelihood Estimation, Mark J. Van Der Laan, Sherri Rose, Susan Gruber

U.C. Berkeley Division of Biostatistics Working Paper Series

This is a compilation of current and past work on targeted maximum likelihood estimation. It features the original targeted maximum likelihood learning paper as well as chapters on super (machine) learning using cross validation, randomized controlled trials, realistic individualized treatment rules in observational studies, biomarker discovery, case-control studies, and time-to-event outcomes with censored data, among others. We hope this collection is helpful to the interested reader and stimulates additional research in this important area.


Causal Inference For Nested Case-Control Studies Using Targeted Maximum Likelihood Estimation, Sherri Rose, Mark J. Van Der Laan Sep 2009

Causal Inference For Nested Case-Control Studies Using Targeted Maximum Likelihood Estimation, Sherri Rose, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A nested case-control study is conducted within a well-defined cohort arising out of a population of interest. This design is often used in epidemiology to reduce the costs associated with collecting data on the full cohort; however, the case control sample within the cohort is a biased sample. Methods for analyzing case-control studies have largely focused on logistic regression models that provide conditional and not marginal causal estimates of the odds ratio. We previously developed a Case-Control Weighted Targeted Maximum Likelihood Estimation (TMLE) procedure for case-control study designs, which relies on the prevalence probability q0. We propose the use of …


Targeted Maximum Likelihood Estimation: A Gentle Introduction, Susan Gruber, Mark J. Van Der Laan Aug 2009

Targeted Maximum Likelihood Estimation: A Gentle Introduction, Susan Gruber, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

This paper provides a concise introduction to targeted maximum likelihood estimation (TMLE) of causal effect parameters. The interested analyst should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. A program written in R is provided. This program implements a basic version of TMLE that can be used to estimate the effect of a binary point treatment on a continuous or binary outcome.


Comparing Risk Scoring Systems Beyond The Roc Paradigm In Survival Analysis, Hajime Uno, Lu Tian, Tianxi Cai, Isaac S. Kohane, L. J. Wei Aug 2009

Comparing Risk Scoring Systems Beyond The Roc Paradigm In Survival Analysis, Hajime Uno, Lu Tian, Tianxi Cai, Isaac S. Kohane, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Combinational Mixtures Of Multiparameter Distributions, Valeria Edefonti, Giovanni Parmigiani Aug 2009

Combinational Mixtures Of Multiparameter Distributions, Valeria Edefonti, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

We introduce combinatorial mixtures - a flexible class of models for inference on mixture distributions whose component have multidimensional parameters. The key idea is to allow each element of the component-specific parameter vectors to be shared by a subset of other components. This approach allows for mixtures that range from very flexible to very parsimonious, and unifies inference on component-specific parameters with inference on the number of components. We develop Bayesian inference and computation approaches for this class of distributions, and illustrate them in an application. This work was originally motivated by the analysis of cancer subtypes: in terms of …


Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel Aug 2009

Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel

COBRA Preprint Series

Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a …


The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin Jul 2009

The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Spatial Cluster Detection For Repeatedly Measured Outcomes While Accounting For Residential History, Andrea J. Cook, Diane Gold, Yi Li Jun 2009

Spatial Cluster Detection For Repeatedly Measured Outcomes While Accounting For Residential History, Andrea J. Cook, Diane Gold, Yi Li

Harvard University Biostatistics Working Paper Series

No abstract provided.


Marginalized Frailty Models For Multivariate Survival Data, Megan Othus, Yi Li Jun 2009

Marginalized Frailty Models For Multivariate Survival Data, Megan Othus, Yi Li

Harvard University Biostatistics Working Paper Series

No abstract provided.


Spatial Cluster Detection For Weighted Outcomes Using Cumulative Geographic Residuals, Andrea J. Cook, Yi Li, David Arterburn, Ram C. Tiwari Jun 2009

Spatial Cluster Detection For Weighted Outcomes Using Cumulative Geographic Residuals, Andrea J. Cook, Yi Li, David Arterburn, Ram C. Tiwari

Harvard University Biostatistics Working Paper Series

No abstract provided.


On The C-Statistics For Evaluating Overall Adequacy Of Risk Prediction Procedures With Censored Survival Data, Hajime Uno, Tianxi Cai, Michael J. Pencina, Ralph B. D'Agostino, L. J. Wei Jun 2009

On The C-Statistics For Evaluating Overall Adequacy Of Risk Prediction Procedures With Censored Survival Data, Hajime Uno, Tianxi Cai, Michael J. Pencina, Ralph B. D'Agostino, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Estimating Subject-Specific Dependent Competing Risk Profile With Censored Event Time Observations, Yi Li, Lu Tian, L. J. Wei May 2009

Estimating Subject-Specific Dependent Competing Risk Profile With Censored Event Time Observations, Yi Li, Lu Tian, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit Apr 2009

Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

The multtest package is a standard Bioconductor package containing a suite of functions useful for executing, summarizing, and displaying the results from a wide variety of multiple testing procedures (MTPs). In addition to many popular MTPs, the central methodological focus of the multtest package is the implementation of powerful joint multiple testing procedures. Joint MTPs are able to account for the dependencies between test statistics by effectively making use of (estimates of) the test statistics joint null distribution. To this end, two additional bootstrap-based estimates of the test statistics joint null distribution have been developed for use in the …


A Class Of Semiparametric Mixture Cure Survival Models With Dependent Censoring, Megan Othus, Yi Li, Ram C. Tiwari Apr 2009

A Class Of Semiparametric Mixture Cure Survival Models With Dependent Censoring, Megan Othus, Yi Li, Ram C. Tiwari

Harvard University Biostatistics Working Paper Series

No abstract provided.


Collaborative Targeted Maximum Likelihood Estimation, Mark J. Van Der Laan, Susan Gruber Apr 2009

Collaborative Targeted Maximum Likelihood Estimation, Mark J. Van Der Laan, Susan Gruber

U.C. Berkeley Division of Biostatistics Working Paper Series

Collaborative double robust targeted maximum likelihood estimators represent a fundamental further advance over standard targeted maximum likelihood estimators of causal inference and variable importance parameters. The targeted maximum likelihood approach involves fluctuating an initial density estimate, (Q), in order to make a bias/variance tradeoff targeted towards a specific parameter in a semi-parametric model. The fluctuation involves estimation of a nuisance parameter portion of the likelihood, g. TMLE and other double robust estimators have been shown to be consistent and asymptotically normally distributed (CAN) under regularity conditions, when either one of these two factors of the likelihood of the data is …


Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit Apr 2009

Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Gaussian graphical models have become popular tools for identifying relationships between genes when analyzing microarray expression data. In the classical undirected Gaussian graphical model setting, conditional independence relationships can be inferred from partial correlations obtained from the concentration matrix (= inverse covariance matrix) when the sample size n exceeds the number of parameters p which need to estimated. In situations where n < p, another approach to graphical model estimation may rely on calculating unconditional (zero-order) and first-order partial correlations. In these settings, the goal is to identify a lower-order conditional independence graph, sometimes referred to as a ‘0-1 graphs’. For either choice of graph, model selection may involve a multiple testing problem, in which edges in a graph are drawn only after rejecting hypotheses involving (saturated or lower-order) partial correlation parameters. Most multiple testing procedures applied in previously proposed graphical model selection algorithms rely on standard, marginal testing methods which do not take into account the joint distribution of the test statistics derived from (partial) correlations. We propose and implement a multiple testing framework useful when testing for edge inclusion during graphical model selection. Two features of our methodology include (i) a computationally efficient and asymptotically valid test statistics joint null distribution derived from influence curves for correlation-based parameters, and (ii) the application of empirical Bayes joint multiple testing procedures which can effectively control a variety of popular Type I error rates by incorpo- rating joint null distributions such as those described here (Dudoit and van der Laan, 2008). Using a dataset from Arabidopsis thaliana, we observe that the use of more sophisticated, modular approaches to multiple testing allows one to identify greater numbers of edges when approximating an undirected graphical model using a 0-1 graph. Our framework may also be extended to edge testing algorithms for other types of graphical models (e.g., for classical undirected, bidirected, and directed acyclic graphs).


The Importance Of Scale For Spatial-Confounding Bias And Precision Of Spatial Regression Estimators, Christopher J. Paciorek Mar 2009

The Importance Of Scale For Spatial-Confounding Bias And Precision Of Spatial Regression Estimators, Christopher J. Paciorek

Harvard University Biostatistics Working Paper Series

Increasingly, regression models are used when residuals are spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When the spatial residual is induced by an unmeasured confounder, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias …


Analysis Of Randomized Comparative Clinical Trial Data For Personalized Treatment Selections, Tianxi Cai, Lu Tian, Peggy H. Wong, L. J. Wei Mar 2009

Analysis Of Randomized Comparative Clinical Trial Data For Personalized Treatment Selections, Tianxi Cai, Lu Tian, Peggy H. Wong, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish Mar 2009

Correlated Binary Regression Using Orthogonalized Residuals, Richard C. Zink, Bahjat F. Qaqish

COBRA Preprint Series

This paper focuses on marginal regression models for correlated binary responses when estimation of the association structure is of primary interest. A new estimating function approach based on orthogonalized residuals is proposed. This procedure allows a new representation and addresses some of the difficulties of the conditional-residual formulation of alternating logistic regressions of Carey, Zeger & Diggle (1993). The new method is illustrated with an analysis of data on impaired pulmonary function.


Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor Mar 2009

Group Comparison Of Eigenvalues And Eigenvectors Of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, Jonathan E. Taylor

Harvard University Biostatistics Working Paper Series

No abstract provided.


Validation Of Differential Gene Expression Algorithms: Application Comparing Fold Change Estimation To Hypothesis Testing, David R. Bickel, Corey M. Yanofsky Feb 2009

Validation Of Differential Gene Expression Algorithms: Application Comparing Fold Change Estimation To Hypothesis Testing, David R. Bickel, Corey M. Yanofsky

COBRA Preprint Series

Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. The widespread confusion on which method to use in practice has been exacerbated by the finding that simply ranking genes by their fold changes sometimes outperforms popular statistical tests.

Algorithms may be compared by quantifying each method's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. For …


Measures To Summarize And Compare The Predictive Capacity Of Markers, Wen Gu, Margaret Pepe Feb 2009

Measures To Summarize And Compare The Predictive Capacity Of Markers, Wen Gu, Margaret Pepe

UW Biostatistics Working Paper Series

The predictive capacity of a marker in a population can be described using the population distribution of risk (Huang et al., 2007; Pepe et al., 2008a; Stern, 2008). Virtually all standard statistical summaries of predictability and discrimination can be derived from it (Gail and Pfeiffer, 2005). The goal of this paper is to develop methods for making inference about risk prediction markers using summary measures derived from the risk distribution. We describe some new clinically motivated summary measures and give new interpretations to some existing statistical measures. Methods for estimating these summary measures are described along with distribution theory that …


Weighting And Prediction In Sample Surveys, Rod Little Feb 2009

Weighting And Prediction In Sample Surveys, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

A fundamental technique in survey sampling is to weight included units by the inverse of their probability of inclusion, which may be known (as in the case of sampling weights) or estimated (as in the case of nonresponse weights). The technique is closely associated with the design-based approach to survey inference, with the idea that units in the sample are representing a certain number of units in the population. I discuss weighting from a modeling perspective. Some common misconceptions of weighting will be addressed, including the idea that modelers can ignore the sampling weights, or that weighting necessarily reduces bias …


Multilevel Functional Principal Component Analysis, Chong-Zhi Di, Ciprian M. Crainiceanu, Brian S. Caffo, Naresh M. Punjabi Jan 2009

Multilevel Functional Principal Component Analysis, Chong-Zhi Di, Ciprian M. Crainiceanu, Brian S. Caffo, Naresh M. Punjabi

Chongzhi Di

The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of sleep and its impacts on health outcomes. A primary metric of the SHHS is the in-home polysomnogram, which includes two electroencephalographic (EEG) channels for each subject, at two visits. The volume and importance of this data presents enormous challenges for analysis. To address these challenges, we introduce multilevel functional principal component analysis (MFPCA), a novel statistical methodology designed to extract core intra- and inter-subject geometric components of multilevel functional data. Though motivated by the SHHS, the proposed methodology is generally applicable, with potential relevance to many modern scientific …