Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 46 of 46

Full-Text Articles in Statistics and Probability

Double Robust Estimation In Longitudinal Marginal Structural Models, Zhuo Yu, Mark J. Van Der Laan Jun 2003

Double Robust Estimation In Longitudinal Marginal Structural Models, Zhuo Yu, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider estimation of causal parameters in a marginal structural model for the discrete intensity of the treatment specific counting process (e.g. hazard of a treatment specific survival time) based on longitudinal observational data on treatment, covariates and survival. We assume the sequential randomization assumption (SRA) on the treatment assignment mechanism and the so called experimental treatment assignment assumption which is needed to identify the causal parameters from the observed data distribution. Under SRA, the likelihood of the observed data structure factorizes in the auxiliary treatment mechanism and the partial likelihood consisting of the product over time of conditional distributions of …


A New Confidence Interval For The Difference Between Two Binomial Proportions Of Paired Data, Xiao-Hua Zhou, Gengsheng Qin Jun 2003

A New Confidence Interval For The Difference Between Two Binomial Proportions Of Paired Data, Xiao-Hua Zhou, Gengsheng Qin

UW Biostatistics Working Paper Series

Motivated by a study on comparing sensitivities and specificities of two diagnostic tests in a paired design when the sample size is small, we first derived an Edgeworth expansion for the studentized difference between two binomial proportions of paired data. The Edgeworth expansion can help us understand why the usual Wald interval for the difference has poor coverage performance in the small sample size. Based on the Edgeworth expansion, we then derived a transformation based confidence interval for the difference. The new interval removes the skewness in the Edgeworth expansion; the new interval is easy to compute, and its coverage …


Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen May 2003

Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen

U.C. Berkeley Division of Biostatistics Working Paper Series

Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from …


Improved Confidence Intervals For The Sensitivity At A Fixed Level Of Specificity Of A Continuous-Scale Diagnostic Test, Xiao-Hua Zhou, Gengsheng Qin May 2003

Improved Confidence Intervals For The Sensitivity At A Fixed Level Of Specificity Of A Continuous-Scale Diagnostic Test, Xiao-Hua Zhou, Gengsheng Qin

UW Biostatistics Working Paper Series

For a continuous-scale test, it is an interest to construct a confidence interval for the sensitivity of the diagnostic test at the cut-off that yields a predetermined level of its specificity (eg. 80%, 90%, or 95%). IN this paper we proposed two new intervals for the sensitivity of a continuous-scale diagnostic test at a fixed level of specificity. We then conducted simulation studies to compare the relative performance of these two intervals with the best existing BCa bootstrap interval, proposed by Platt et al. (2000). Our simulation results showed that the newly proposed intervals are better than the BCa bootstrap …


Bootstrap Confidence Intervals For Medical Costs With Censored Observations, Hongyu Jiang, Xiao-Hua Zhou May 2003

Bootstrap Confidence Intervals For Medical Costs With Censored Observations, Hongyu Jiang, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

Medical costs data with administratively censored observations often arise in cost-effectiveness studies of treatments for life threatening diseases. Mean of medical costs incurred from the start of a treatment till death or certain timepoint after the implementation of treatment is frequently of interest. In many situations, due to the skewed nature of the cost distribution and non-uniform rate of cost accumulation over time, the currently available normal approximation confidence interval has poor coverage accuracy. In this paper, we proposed a bootstrap confidence interval for the mean of medical costs with censored observations. In simulation studies, we showed that the proposed …


New Intervals For The Difference Between Two Independent Binomial Proportions, Xiao-Hua Zhou, Min Tsao, Gengsheng Qin May 2003

New Intervals For The Difference Between Two Independent Binomial Proportions, Xiao-Hua Zhou, Min Tsao, Gengsheng Qin

UW Biostatistics Working Paper Series

In this paper we gave an Edgeworth expansion for the studentized difference of two binomial proportions. We then proposed two new intervals by correcting the skewness in the Edgeworth expansion in a direct and an indirect way. Such the bias-correct confidence intervals are easy to compute, and their coverage probabilities converge to the nominal level at a rate of O(n-½), where n is the size of the combined samples. Our simulation results suggest tat in finite samples the new interval based on the indirect method have the similar performance to the two best existing intervals in terms of coverage accuracy …


A Bootstrap Confidence Interval Procedure For The Treatment Effect Using Propensity Score Subclassification, Wanzhu Tu, Xiao-Hua Zhou May 2003

A Bootstrap Confidence Interval Procedure For The Treatment Effect Using Propensity Score Subclassification, Wanzhu Tu, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

In the analysis of observational studies, propensity score subclassification has been shown to be a powerful method for adjusting unbalanced covariates for the purpose of causal inferences. One practical difficulty in carrying out such an analysis is to obtain a correct variance estimate for such inferences, while reducing bias in the estimate of the treatment effect due to an imbalance in the measured covariates. In this paper, we propose a bootstrap procedure for the inferences concerning the average treatment effect; our bootstrap method is based on an extension of Efron’s bias-corrected accelerated (BCa) bootstrap confidence interval to a two-sample problem. …


Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan May 2003

Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan

The University of Michigan Department of Biostatistics Working Paper Series

This review is an attempt to understand the landmark papers of Robins, Rotnitzky, and Zhao (1994) and Robins and Rotnitzky (1992). We revisit their main results and corresponding proofs using the theory outlined in the monograph by Bickel, Klaassen, Ritov, and Wellner (1993). We also discuss an illustrative example to show the details of applying these theoretical results.


Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little Mar 2003

Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Samplers often distrust model-based approaches to survey inference due to concerns about model misspecification when applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total in probability sampling designs. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator …


A Semiparametric Model Selection Criterion With Applications To The Marginal Structural Model, M. Alan Brookhart, Mark J. Van Der Laan Mar 2003

A Semiparametric Model Selection Criterion With Applications To The Marginal Structural Model, M. Alan Brookhart, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Estimators for the parameter of interest in semiparametric models often depend on a guessed model for the nuisance parameter. The choice of the model for the nuisance parameter can affect both the finite sample bias and efficiency of the resulting estimator of the parameter of interest. In this paper we propose a finite sample criterion based on cross validation that can be used to select a nuisance parameter model from a list of candidate models. We show that expected value of this criterion is minimized by the nuisance parameter model that yields the estimator of the parameter of interest with …


Ibd Configuration Transition Matrices And Linkage Score Tests For Unilineal Relative Pairs, Sandrine Dudoit Feb 2003

Ibd Configuration Transition Matrices And Linkage Score Tests For Unilineal Relative Pairs, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Properties of transition matrices between IBD configurations are derived for four general classes of unilineal relative pairs obtained from the grand-parent/ grand-child, half-sib, avuncular, and cousin relationships. In this setting, IBD configurations are defined as orbits of groups acting on a set of inheritance vectors. Properties of the transition matrix between IBD configurations at two linked loci are derived by relating its infinitesimal generator to the adjacency matrix of a quotient graph. The second largest eigenvalue of the infinitesimal generator and its multiplicity are key in determining the form of the transition matrix and of likelihood-based linkage tests such as …


Asymptotic Optimality Of Likelihood Based Cross-Validation, Mark J. Van Der Laan, Sandrine Dudoit, Sunduz Keles Feb 2003

Asymptotic Optimality Of Likelihood Based Cross-Validation, Mark J. Van Der Laan, Sandrine Dudoit, Sunduz Keles

U.C. Berkeley Division of Biostatistics Working Paper Series

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish asymptotic optimality of a general class of likelihood based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation), in the sense that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true …


Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan Feb 2003

Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …


Semiparametric Receiver Operating Characteristic Analysis To Evaluate Biomarkers For Disease, Tianxi Cai, Margaret S. Pepe Jan 2003

Semiparametric Receiver Operating Characteristic Analysis To Evaluate Biomarkers For Disease, Tianxi Cai, Margaret S. Pepe

UW Biostatistics Working Paper Series

The receiver operating characteristic (ROC) curve is a popular method for characterizing the accuracy of diagnostic tests when test results are not binary. Various methodologies for estimating and comparing ROC curves have been developed. One approach, due to Pepe, uses a parametric regression model with the baseline function specified up to a finite-dimensional parameter. In this article we extend the regression models by allowing arbitrary nonparametric baseline functions. We also provide asymptotic distribution theory and procedures for making statistical inference. We illustrate our approach with dataset from a prostate cancer biomarker study. Simulation studies suggest that the extra flexibility inherent …


Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe Jan 2003

Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe

UW Biostatistics Working Paper Series

Medical advances continue to provide new and potentially better means for detecting disease. Such is true in cancer, for example, where biomarkers are sought for early detection and where improvements in imaging methods may pick up the initial functional and molecular changes associated with cancer development. In other binary classification tasks, computational algorithms such as Neural Networks, Support Vector Machines and Evolutionary Algorithms have been applied to areas as diverse as credit scoring, object recognition, and peptide-binding prediction. Before a classifier becomes an accepted technology, it must undergo rigorous evaluation to determine its ability to discriminate between states. Characterization of …


Checking Assumptions In Latent Class Regression Models Via A Markov Chain Monte Carlo Estimation Approach: An Application To Depression And Socio-Economic Status, Elizabeth Garrett, Richard Miech, Pamela Owens, William W. Eaton, Scott L. Zeger Jan 2003

Checking Assumptions In Latent Class Regression Models Via A Markov Chain Monte Carlo Estimation Approach: An Application To Depression And Socio-Economic Status, Elizabeth Garrett, Richard Miech, Pamela Owens, William W. Eaton, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to …