Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 32

Full-Text Articles in Statistics and Probability

Semiparametric Approaches For Joint Modeling Of Longitudinal And Survival Data With Time Varying Coefficients, Xiao Song, C.Y. Wang Dec 2005

Semiparametric Approaches For Joint Modeling Of Longitudinal And Survival Data With Time Varying Coefficients, Xiao Song, C.Y. Wang

UW Biostatistics Working Paper Series

We study joint modeling of survival and longitudinal data. There are two regression models of interest. The primary model is for survival outcomes, which are assumed to follow a time varying coefficient proportional hazards model. The second model is for longitudinal data, which are assumed to follow a random effects model. Based on the trajectory of a subject's longitudinal data, some covariates in the survival model are functions of the unobserved random effects. Estimated random effects are generally different from the unobserved random effects and hence this leads to covariate measurement error. To deal with covariate measurement error, we propose …


Alleviating Linear Ecological Bias And Optimal Design With Subsample Data, Adam Glynn, Jon Wakefield, Mark Handcock, Thomas Richardson Dec 2005

Alleviating Linear Ecological Bias And Optimal Design With Subsample Data, Adam Glynn, Jon Wakefield, Mark Handcock, Thomas Richardson

UW Biostatistics Working Paper Series

In this paper, we illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides three main benefits. First, by including the individual level subsample data, the biases associated with linear ecological inference can be eliminated. Second, by supplementing the subsample data with ecological data, the information about parameters will be increased. Third, we can use readily available ecological data to design optimal subsampling schemes, so as to further increase the information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree …


Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden Dec 2005

Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden

UW Biostatistics Working Paper Series

The study of the cell-cycle is important in order to aid in our understanding of the basic mechanisms of life, yet progress has been slow due to the complexity of the process and our lack of ability to study it at high resolution. Recent advances in microarray technology have enabled scientists to study the gene expression at the genome-scale with a manageable cost, and there has been an increasing effort to identify cell-cycle regulated genes. In this chapter, we discuss the analysis of cell-cycle gene expression data, focusing on a model-based Bayesian approaches. The majority of the models we describe …


Empirical Likelihood Inference For The Area Under The Roc Curve, Gengsheng Qin, Xiao-Hua Zhou Dec 2005

Empirical Likelihood Inference For The Area Under The Roc Curve, Gengsheng Qin, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

For a continuous-scale diagnostic test, the most commonly used summary index of the receiver operating characteristic (ROC) curve is the area under the curve (AUC) that measures the accuracy of the diagnostic test. In this paper we propose an empirical likelihood approach for the inference of AUC. We first define an empirical likelihood ratio for AUC and show that its limiting distribution is a scaled chi-square distribution. We then obtain an empirical likelihood based confidence interval for AUC using the scaled chi-square distribution. This empirical likelihood inference for AUC can be extended to stratified samples and the resulting limiting distribution …


Interval Estimation For The Ratio And Difference Of Two Lognormal Means, Yea-Hung Chen, Xiao-Hua Zhou Dec 2005

Interval Estimation For The Ratio And Difference Of Two Lognormal Means, Yea-Hung Chen, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

Health research often gives rise to data that follow lognormal distributions. In two sample situations, researchers are likely to be interested in estimating the difference or ratio of the population means. Several methods have been proposed for providing confidence intervals for these parameters. However, it is not clear which techniques are most appropriate, or how their performance might vary. Additionally, methods for the difference of means have not been adequately explored. We discuss in the present article five methods of analysis. These include two methods based on the log-likelihood ratio statistic and a generalized pivotal approach. Additionally, we provide and …


Inferences In Censored Cost Regression Models With Empirical Likelihood, Xiao-Hua Zhou, Gengsheng Qin, Huazhen Lin, Gang Li Dec 2005

Inferences In Censored Cost Regression Models With Empirical Likelihood, Xiao-Hua Zhou, Gengsheng Qin, Huazhen Lin, Gang Li

UW Biostatistics Working Paper Series

In many studies of health economics, we are interested in the expected total cost over a certain period for a patient with given characteristics. Problems can arise if cost estimation models do not account for distributional aspects of costs. Two such problems are 1) the skewed nature of the data and 2) censored observations. In this paper we propose an empirical likelihood (EL) method for constructing a confidence region for the vector of regression parameters and a confidence interval for the expected total cost of a patient with the given covariates. We show that this new method has good theoretical …


Confidence Intervals For Predictive Values Using Data From A Case Control Study, Nathaniel David Mercaldo, Xiao-Hua Zhou, Kit F. Lau Dec 2005

Confidence Intervals For Predictive Values Using Data From A Case Control Study, Nathaniel David Mercaldo, Xiao-Hua Zhou, Kit F. Lau

UW Biostatistics Working Paper Series

The accuracy of a binary-scale diagnostic test can be represented by sensitivity (Se), specificity (Sp) and positive and negative predictive values (PPV and NPV). Although Se and Sp measure the intrinsic accuracy of a diagnostic test that does not depend on the prevalence rate, they do not provide information on the diagnostic accuracy of a particular patient. To obtain this information we need to use PPV and NPV. Since PPV and NPV are functions of both the intrinsic accuracy and the prevalence of the disease, constructing confidence intervals for PPV and NPV for a particular patient in a population with …


Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey Nov 2005

Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

Nearest centroid classifiers have recently been successfully employed in high-dimensional applications. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is typically carried out by computing univariate statistics for each feature individually, without consideration for how a subset of features performs as a whole. For subsets of a given size, we characterize the optimal choice of features, corresponding to those yielding the smallest misclassification rate. Furthermore, we propose an algorithm for estimating this optimal subset in practice. Finally, we investigate the applicability of shrinkage ideas to nearest centroid classifiers. We use gene-expression microarrays for …


A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey Nov 2005

A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

A two-channel microarray measures the relative expression levels of thousands of genes from a pair of biological samples. In order to reliably compare gene expression levels between and within arrays, it is necessary to remove systematic errors that distort the biological signal of interest. The standard for accomplishing this is smoothing "MA-plots" to remove intensity-dependent dye bias and array-specific effects. However, MA methods require strong assumptions. We review these assumptions and derive several practical scenarios in which they fail. The "dye-swap" normalization method has been much less frequently used because it requires two arrays per pair of samples. We show …


Estimating A Treatment Effect With Repeated Measurements Accounting For Varying Effectiveness Duration, Ying Qing Chen, Jingrong Yang, Su-Chun Cheng Nov 2005

Estimating A Treatment Effect With Repeated Measurements Accounting For Varying Effectiveness Duration, Ying Qing Chen, Jingrong Yang, Su-Chun Cheng

UW Biostatistics Working Paper Series

To assess treatment efficacy in clinical trials, certain clinical outcomes are repeatedly measured for same subject over time. They can be regarded as function of time. The difference in their mean functions between the treatment arms usually characterises a treatment effect. Due to the potential existence of subject-specific treatment effectiveness lag and saturation times, erosion of treatment effect in the difference may occur during the observation period of time. Instead of using ad hoc parametric or purely nonparametric time-varying coefficients in statistical modeling, we first propose to model the treatment effectiveness durations, which are the varying time intervals between the …


Is The Number Of Sick Persons In A Cohort Constant Over Time?, Paula Diehr, Ann Derleth, Anne Newman, Liming Cai Oct 2005

Is The Number Of Sick Persons In A Cohort Constant Over Time?, Paula Diehr, Ann Derleth, Anne Newman, Liming Cai

UW Biostatistics Working Paper Series

Objectives: To estimate the number of persons in a cohort who are sick, over time.

Methods: We calculated the number of sick persons in the Cardiovascular Health Study (CHS), a cohort study of older adults followed up to 14 years, using eight definitions of “healthy” and “sick”. We projected the number in each health state over time for a birth cohort.

Results: The number of sick persons in CHS was approximately constant for 14 years, for all definitions of “sick”. The estimated number of sick persons in the birth cohort was approximately constant from ages 55-75, after which it decreased. …


Marginal Regression Modeling Under Irregular, Biased Sampling, Petra Buzkova, Thomas Lumley Sep 2005

Marginal Regression Modeling Under Irregular, Biased Sampling, Petra Buzkova, Thomas Lumley

UW Biostatistics Working Paper Series

In longitudinal studies observations are often obtained at continuous subject-specific times. Frequently the availability of outcome data may be related to the outcome measure or other covariates that are related to the outcome measure. Under such biased sampling designs unadjusted regression analysis yield biased estimates. Building on the work of Lin & Ying (2001) that integrates counting processes techniques with longitudinal data settings we propose a class of estimators that can handle biased sampling. We call those estimators ``inverse--intensity--rate--ratio--weighted'' (IIRR) estimators. Of major focus is a mean--response model where we examine the marginal effect of the covariate X at time …


Longitudinal Data Analysis For Generalized Linear Models Under Irregular, Biased Sampling: Situations With Follow-Up Dependent On Outcome Or Auxiliary Outcome-Related Variables, Petra Buzkova, Thomas Lumley Sep 2005

Longitudinal Data Analysis For Generalized Linear Models Under Irregular, Biased Sampling: Situations With Follow-Up Dependent On Outcome Or Auxiliary Outcome-Related Variables, Petra Buzkova, Thomas Lumley

UW Biostatistics Working Paper Series

In longitudinal studies, observations are often obtained at subject-specific observation times. Those times can be continuous times, not at a set of prespecified times. Frequently the observation times may be related to the outcome measure or other auxiliary variables that are related to the outcome measure but undesirable to condition upon in the regression model for outcome. Regression analysis unadjusted for such sampling designs yield biased estimates. Based on estimating equations, we propose a class of estimators in generalized linear regression models that can handle biased sampling under continuous observation times. We call those estimators ``inverse--intensity rate--ratio--weighted'' (IIRR) estimators. The …


Semiparametric Loglinear Regression For Longitudinal Measurements Subject To Irregular, Biased Follow-Up, Petra Buzkova, Thomas Lumley Sep 2005

Semiparametric Loglinear Regression For Longitudinal Measurements Subject To Irregular, Biased Follow-Up, Petra Buzkova, Thomas Lumley

UW Biostatistics Working Paper Series

We propose a method for analysis of loglinear regression models for longitudinal data that are subject to continuous and irregular follow-up. Frequently, if the follow-up is irregular, the availability of outcome data may be related to the outcome measure or other covariates that are related to the outcome measure. Under such biased sampling designs unadjusted regression analysis yield biased estimates. We examine the marginal association of the covariates X at time t and the logarithm of the mean of response Y at time t. We focus on semiparametric regression with unspecified baseline function of time. To predict the follow-up times …


The Optimal Discovery Procedure: A New Approach To Simultaneous Significance Testing, John D. Storey Sep 2005

The Optimal Discovery Procedure: A New Approach To Simultaneous Significance Testing, John D. Storey

UW Biostatistics Working Paper Series

Significance testing is one of the main objectives of statistics. The Neyman-Pearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. As …


The Optimal Discovery Procedure For Large-Scale Significance Testing, With Applications To Comparative Microarray Experiments, John D. Storey, James Y. Dai, Jeffrey T. Leek Sep 2005

The Optimal Discovery Procedure For Large-Scale Significance Testing, With Applications To Comparative Microarray Experiments, John D. Storey, James Y. Dai, Jeffrey T. Leek

UW Biostatistics Working Paper Series

As much of the focus of genetics and molecular biology has shifted toward the systems level, it has become increasingly important to accurately extract biologically relevant signal from thousands of related measurements. The common property among these high-dimensional biological studies is that the measured features have a rich and largely unknown underlying structure. One example of much recent interest is identifying differentially expressed genes in comparative microarray experiments. We propose a new approach aimed at optimally performing many hypothesis tests in a high-dimensional study. This approach estimates the Optimal Discovery Procedure (ODP), which has recently been introduced and theoretically shown …


Linear Regression Of Censored Length-Biased Lifetimes, Ying Qing Chen, Yan Wang Jul 2005

Linear Regression Of Censored Length-Biased Lifetimes, Ying Qing Chen, Yan Wang

UW Biostatistics Working Paper Series

Length-biased lifetimes may be collected in observational studies or sample surveys due to biased sampling scheme. In this article, we use a linear regression model, namely, the accelerated failure time model, for the population lifetime distributions in regression analysis of the length-biased lifetimes. It is discovered that the associated regression parameters are invariant under the length-biased sampling scheme. According to this discovery, we propose the quasi partial score estimating equations to estimate the population regression parameters. The proposed methodologies are evaluated and demonstrated by simulation studies and an application to actual data set.


On Additive Regression Of Expectancy, Ying Qing Chen Jun 2005

On Additive Regression Of Expectancy, Ying Qing Chen

UW Biostatistics Working Paper Series

Regression models have been important tools to study the association between outcome variables and their covariates. The traditional linear regression models usually specify such an association by the expectations of the outcome variables as function of the covariates and some parameters. In reality, however, interests often focus on their expectancies characterized by the conditional means. In this article, a new class of additive regression models is proposed to model the expectancies. The model parameters carry practical implication, which may allow the models to be useful in applications such as treatment assessment, resource planning or short-term forecasting. Moreover, the new model …


An Empirical Process Limit Theorem For Sparsely Correlated Data, Thomas Lumley Jun 2005

An Empirical Process Limit Theorem For Sparsely Correlated Data, Thomas Lumley

UW Biostatistics Working Paper Series

We consider data that are dependent, but where most small sets of observations are independent. By extending Bernstein's inequality we prove a strong law of law numbers and an empirical process central limit theorem under bracketing entropy conditions.


A Linear Regression Framework For Receiver Operating Characteristic(Roc) Curve Analysis, Zheng Zhang, Margaret S. Pepe May 2005

A Linear Regression Framework For Receiver Operating Characteristic(Roc) Curve Analysis, Zheng Zhang, Margaret S. Pepe

UW Biostatistics Working Paper Series

In the field of medical diagnostic testing, the receiver operating characteristics(ROC) curve has long been used as a standard statistical tool to assess the accuracy of tests that yield continuous results. Although previous research in this area focused mostly on estimating the ROC curve, recently it has been recognized that the accuracy of a given test may fluctuate depending on certain factors, which motivates modelling covariate effects on the ROC curve. Comparing the corresponding ROC curves between two or more tests is a special case of covariate effect modelling. In this manuscript, we introduce a linear regression framework to model …


Attributable Risk Function In The Proportional Hazards Model, Ying Qing Chen, Chengcheng Hu, Yan Wang May 2005

Attributable Risk Function In The Proportional Hazards Model, Ying Qing Chen, Chengcheng Hu, Yan Wang

UW Biostatistics Working Paper Series

As an epidemiological parameter, the population attributable fraction is an important measure to quantify the public health attributable risk of an exposure to morbidity and mortality. In this article, we extend this parameter to the attributable fraction function in survival analysis of time-to-event outcomes, and further establish its estimation and inference procedures based on the widely used proportional hazards models. Numerical examples and simulations studies are presented to validate and demonstrate the proposed methods.


Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou May 2005

Multiple Imputation For Correcting Verification Bias, Ofer Harel, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

In the case in which all subjects are screened using a common test, and only a subset of these subjects are tested using a golden standard test, it is well documented that there is a risk for bias, called verification bias. When the test has only two levels (e.g. positive and negative) and we are trying to estimate the sensitivity and specificity of the test, one is actually constructing a confidence interval for a binomial proportion. Since it is well documented that this estimation is not trivial even with complete data, we adopt Multiple imputation (MI) framework for verification bias …


A Comparison Of Parametric And Coarsened Bayesian Interval Estimation In The Presence Of A Known Mean-Variance Relationship, Kent Koprowicz, Scott S. Emerson, Peter Hoff Apr 2005

A Comparison Of Parametric And Coarsened Bayesian Interval Estimation In The Presence Of A Known Mean-Variance Relationship, Kent Koprowicz, Scott S. Emerson, Peter Hoff

UW Biostatistics Working Paper Series

While the use of Bayesian methods of analysis have become increasingly common, classical frequentist hypothesis testing still holds sway in medical research - especially clinical trials. One major difference between a standard frequentist approach and the most common Bayesian approaches is that even when a frequentist hypothesis test is derived from parametric models, the interpretation and operating characteristics of the test may be considered in a distribution-free manner. Bayesian inference, on the other hand, is often conducted in a parametric setting where the interpretation of the results is dependent on the parametric model. Here we consider a Bayesian counterpart to …


Application Of The Time-Dependent Roc Curves For Prognostic Accuracy With Multiple Biomarkers, Yingye Zheng, Tianxi Cai, Ziding Feng Apr 2005

Application Of The Time-Dependent Roc Curves For Prognostic Accuracy With Multiple Biomarkers, Yingye Zheng, Tianxi Cai, Ziding Feng

UW Biostatistics Working Paper Series

The rapid advancement in molecule technology has lead to the discovery of many markers that have potential applications in disease diagnosis and prognosis. In a prospective cohort study, information on a panel of biomarkers as well as the disease status for a patient are routinely collected over time. Such information is useful to predict patients' prognosis and select patients for targeted therapy. In this paper, we develop procedures for constructing a composite test with optimal discrimination power when there are multiple markers available to assist in prediction and characterize the accuracy of the resulting test by extending the time-dependent receiver …


New Confidence Intervals For The Difference Between Two Sensitivities At A Fixed Level Of Specificity, Gengsheng Qin, Yu-Sheng Hsu, Xiao-Hua Zhou Mar 2005

New Confidence Intervals For The Difference Between Two Sensitivities At A Fixed Level Of Specificity, Gengsheng Qin, Yu-Sheng Hsu, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

For two continuous-scale diagnostic tests, it is of interest to compare their sensitivities at a predetermined level of specificity. In this paper we propose three new intervals for the difference between two sensitivities at a fixed level of specificity. These intervals are easy to compute. We also conduct simulation studies to compare the relative performance of the new intervals with the existing normal approximation based interval proposed by Wieand et al (1989). Our simulation results show that the newly proposed intervals perform better than the existing normal approximation based interval in terms of coverage accuracy and interval length.


Frequentist Evaluation Of Group Sequential Clinical Trial Designs, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen Mar 2005

Frequentist Evaluation Of Group Sequential Clinical Trial Designs, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen

UW Biostatistics Working Paper Series

Group sequential stopping rules are often used as guidelines in the monitoring of clinical trials in order to address the ethical and efficiency issues inherent in human testing of a new treatment or preventive agent for disease. Such stopping rules have been proposed based on a variety of different criteria, both scientific (e.g., estimates of treatment effect) and statistical (e.g., frequentist type I error, Bayesian posterior probabilities, stochastic curtailment). It is easily shown, however, that a stopping rule based on one of those criteria induces a stopping rule on all other criteria. Thus the basis used to initially define a …


Bayesian Evaluation Of Group Sequential Clinical Trial Designs, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen Mar 2005

Bayesian Evaluation Of Group Sequential Clinical Trial Designs, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen

UW Biostatistics Working Paper Series

Clincal trial designs often incorporate a sequential stopping rule to serve as a guide in the early termination of a study. When choosing a particular stopping rule, it is most common to examine frequentist operating characteristics such as type I error, statistical power, and precision of confi- dence intervals (Emerson, et al. [1]). Increasingly, however, clinical trials are designed and analyzed in the Bayesian paradigm. In this paper we describe how the Bayesian operating characteristics of a particular stopping rule might be evaluated and communicated to the scientific community. In particular, we consider a choice of probability models and a …


On The Use Of Stochastic Curtailment In Group Sequential Clinical Trials, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen Mar 2005

On The Use Of Stochastic Curtailment In Group Sequential Clinical Trials, Scott S. Emerson, John M. Kittelson, Daniel L. Gillen

UW Biostatistics Working Paper Series

Many different criteria have been proposed for the selection of a stopping rule for group sequen- tial trials. These include both scientific (e.g., estimates of treatment effect) and statistical (e.g., frequentist type I error, Bayesian posterior probabilities, stochastic curtailment) measures of the evidence for or against beneficial treatment effects. Because a stopping rule based on one of those criteria induces a stopping rule on all other criteria, the utility of any particular scale relates to the ease with which it allows a clinical trialist to search for sequential sampling plans having de- sirable operating characteristics. In this paper we examine …


The Clustering Of Regression Models Method With Applications In Gene Expression Data, Li-Xuan Qin, Steven G. Self Jan 2005

The Clustering Of Regression Models Method With Applications In Gene Expression Data, Li-Xuan Qin, Steven G. Self

UW Biostatistics Working Paper Series

Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we propose a new model-based clustering method -- the clustering of regression models method, which groups genes that …


Insights Into Latent Class Analysis, Margaret S. Pepe, Holly Janes Jan 2005

Insights Into Latent Class Analysis, Margaret S. Pepe, Holly Janes

UW Biostatistics Working Paper Series

Latent class analysis is a popular statistical technique for estimating disease prevalence and test sensitivity and specificity. It is used when a gold standard assessment of disease is not available but results of multiple imperfect tests are. We derive analytic expressions for the parameter estimates in terms of the raw data, under the conditional independence assumption. These expressions indicate explicitly how observed two- and three-way associations between test results are used to infer disease prevalence and test operating characteristics. Although reasonable if the conditional independence model holds, the estimators have no basis when it fails. We therefore caution against using …