Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

PDF

2006

Physical Sciences and Mathematics

COBRA

Keyword
Publication

Articles 1 - 30 of 101

Full-Text Articles in Entire DC Network

Lehmann Family Of Roc Curves, Mithat Gonen, Glenn Heller Dec 2006

Lehmann Family Of Roc Curves, Mithat Gonen, Glenn Heller

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Receiver operating characteristic (ROC) curves are useful in evaluating the ability of a continuous marker in discriminating between the two states of a binary outcome such as diseased/not diseased. The most popular parametric model for an ROC curve is the binormal model which assumes that the marker is normally distributed conditional on the outcome. Here we present an alternative to the binormal model based on the Lehmann family, also known as the proportional hazards specification. The resulting ROC curve and its functionals (such as the area under the curve) have simple analytic forms. We derive closed-form expressions for the asymptotic …


Interacting With Local And Remote Data Respositories Using The Stashr Package, Sandrah P. Eckel, Roger Peng Dec 2006

Interacting With Local And Remote Data Respositories Using The Stashr Package, Sandrah P. Eckel, Roger Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

The stashR package (a Set of Tools for Administering SHared Repositories) for R implements a simple key-value style database where character string keys are associated with data values. The key-value databases can be either stored locally on the user's computer or accessed remotely via the Internet. Methods specific to the stashR package allow users to share data repositories or access previously created remote data repositories. In particular, methods are available for the S4 classes localDB and remoteDB to insert, retrieve, or delete data from the database as well as to synchronize local copies of the data to the remote version …


A Likelihood Based Method For Real Time Estimation Of The Serial Interval And Reproductive Number Of An Epidemic, Laura Forsberg White, Marcello Pagano Dec 2006

A Likelihood Based Method For Real Time Estimation Of The Serial Interval And Reproductive Number Of An Epidemic, Laura Forsberg White, Marcello Pagano

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Semiparametric Approach For The Nonparametric Transformation Survival Model With Multiple Covariates, Xiao Song, Shuangge Ma, Jian Huang, Xiao-Hua Zhou Dec 2006

A Semiparametric Approach For The Nonparametric Transformation Survival Model With Multiple Covariates, Xiao Song, Shuangge Ma, Jian Huang, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

The nonparametric transformation model for survival time that makes no parametric assumptions on both the transformation function and the error is appealing in its flexibility. The nonparametric transformation model makes no assumption on the forms of the transformation function and the error distribution. This model is appealing in its flexibility for modeling censored survival data. Current approaches for estimation of the regression parameters involve maximizing discontinuous objective functions, which are numerically infeasible to implement in the case of multiple covariates. Based on the partial rank estimator (Khan & Tamer, 2004), we propose a smoothed partial rank estimator which maximizes a …


Gamma Shape Mixtures For Heavy-Tailed Distributions, Sergio Venturini, Francesca Dominici, Giovanni Parmigiani Dec 2006

Gamma Shape Mixtures For Heavy-Tailed Distributions, Sergio Venturini, Francesca Dominici, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

An important question in health services research is the estimation of the proportion of medical expenditures that exceed a given threshold. Typically, medical expenditures present highly skewed, heavy tailed distributions, for which a) simple variable transformations are insufficient to achieve a tractable low- dimensional parametric form and b) nonparametric methods are not efficient in estimating exceedance probabilities for large thresholds. Motivated by this context, in this paper we propose a general Bayesian approach for the estimation of tail probabilities of heavy-tailed distributions,based on a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides …


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

SUMMARY. We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model …


Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan Nov 2006

Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

No abstract provided.


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


Analysis Of Case-Control Age-At-Onset Data Using A Modified Case-Cohort Method, Bin Nan, Xihong Lin Nov 2006

Analysis Of Case-Control Age-At-Onset Data Using A Modified Case-Cohort Method, Bin Nan, Xihong Lin

The University of Michigan Department of Biostatistics Working Paper Series

Case-control designs are widely used in rare disease studies. In a typical case-control study, data are collected from a sample of all available subjects who have experienced a disease (cases) and a sub-sample of subjects who have not experienced the disease (controls) in a study cohort. Cases are often oversampled in case-control studies. Logistic regression is a common tool to estimate the relative risks of the disease and a set of covariates. Very often in such a study, information of ages-at-onset of the disease for all cases and ages at survey of controls are known. Standard logistic regression analysis using …


Smoothed Rank Regression With Censored Data, Glenn Heller Nov 2006

Smoothed Rank Regression With Censored Data, Glenn Heller

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

A weighted rank estimating function is proposed to estimate the regression parameter vector in an accelerated failure time model with right censored data. In general, rank estimating functions are discontinuous in the regression parameter, creating difficulties in determining the asymptotic distribution of the estimator. A local distribution function is used to create a rank based estimating function that is continuous and monotone in the regression parameter vector. A weight is included in the estimating function to produce a bounded influence estimate. The asymptotic distribution of the regression estimator is developed and simulations are performed to examine its finite sample properties. …


Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins Nov 2006

Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins

COBRA Preprint Series

Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. This relationship is considered for both binary and non-binary variables. Counterexamples are provide to show that the results do not hold under less restrictive conditions. The ideas of monotonic effects are furthermore used to relate signed edges on a directed acyclic graph to qualitative effect modification.


Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch Nov 2006

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …


Doubly Penalized Buckley-James Method For Survival Data With High-Dimensional Covariates, Sijian Wang, Bin Nan, Ji Zhu, David G. Beer Nov 2006

Doubly Penalized Buckley-James Method For Survival Data With High-Dimensional Covariates, Sijian Wang, Bin Nan, Ji Zhu, David G. Beer

The University of Michigan Department of Biostatistics Working Paper Series

Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses a mixture of L1-norm and L2-norm penalties. Similar to the elastic-net method for linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by cross-validation and uniform design. …


A Note On Bias Due To Fitting Prospective Multivariate Generalized Linear Models To Categorical Outcomes Ignoring Retrospective Sampling Schemes, Bhramar Mukherjee, Ivy Liu Nov 2006

A Note On Bias Due To Fitting Prospective Multivariate Generalized Linear Models To Categorical Outcomes Ignoring Retrospective Sampling Schemes, Bhramar Mukherjee, Ivy Liu

The University of Michigan Department of Biostatistics Working Paper Series

Outcome dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the “cases” based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to …


Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency, Bhramar Mukherjee, Nilanjan Chatterjee Nov 2006

Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency, Bhramar Mukherjee, Nilanjan Chatterjee

The University of Michigan Department of Biostatistics Working Paper Series

Standard prospective logistic regression analysis of case-control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern “retrospective” methods, including the “case-only” approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel approach to analyze case-control data that can relax the gene-environment independence assumption using an empirical Bayes framework. In the special case, involving a …


Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley Oct 2006

Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley

UW Biostatistics Working Paper Series

This paper presents large cluster asymptotic results for generalized estimating equations. The complexity of working correlation model is characterized in terms of the number of working correlation components to be estimated. When the cluster size is relatively large, we may encounter a situation where a high-dimensional working correlation matrix is modeled and estimated from the data. In the present asymptotic setting, the cluster size and the complexity of working correlation model grow with the number of independent clusters. We show the existence, weak consistency and asymptotic normality of marginal regression parameter estimators using the results of empirical process theory and …


Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd Oct 2006

Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd

UW Biostatistics Working Paper Series

The panel study design is commonly used to evaluate the short-term health effects of air pollution. Standard statistical methods for analyzing longitudinal data are available, but the literature reveals that the techniques are not well understood by practitioners. We illustrate these methods using data from the 1999 to 2002 Seattle panel study. Marginal, conditional, and transitional approaches for modeling longitudinal data are reviewed and contrasted with respect to their parameter interpretation and methods for accounting for correlation and dealing with missing data. We also discuss and illustrate techniques for controlling for time-dependent and time-independent confounding, and for exploring and summarizing …


Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg Oct 2006

Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg

Harvard University Biostatistics Working Paper Series

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for …


Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli Oct 2006

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …


Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin Oct 2006

Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one has available an estimate of the density of the data generating distribution such as a maximum likelihood estimator according to a given or data adaptively selected model. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of the density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and …


Crude Cumulative Incidence In The Form Of A Horvitz-Thompson Like And Kaplan-Meier Like Estimator, Laura Antolini, Elia Mario Biganzoli, Patrizia Boracchi Oct 2006

Crude Cumulative Incidence In The Form Of A Horvitz-Thompson Like And Kaplan-Meier Like Estimator, Laura Antolini, Elia Mario Biganzoli, Patrizia Boracchi

COBRA Preprint Series

The link between the nonparametric estimator of the crude cumulative incidence of a competing risk and the Kaplan-Meier estimator is exploited. The equivalence of the nonparametric crude cumulative incidence to an inverse-probability-of-censoring weighted average of the sub-distribution function is proved. The link between the estimation of crude cumulative incidence curves and Gray's family of nonparametric tests is considered. The crude cumulative incidence is proved to be a Kaplan-Meier like estimator based on the sub-distribution hazard, i.e. the quantity on which Gray's family of tests is based. A standard probabilistic formalism is adopted to have a note accessible to applied statisticians.


Cox Models With Nonlinear Effect Of Covariates Measured With Error: A Case Study Of Chronic Kidney Disease Incidence, Ciprian M. Crainiceanu, David Ruppert, Josef Coresh Sep 2006

Cox Models With Nonlinear Effect Of Covariates Measured With Error: A Case Study Of Chronic Kidney Disease Incidence, Ciprian M. Crainiceanu, David Ruppert, Josef Coresh

Johns Hopkins University, Dept. of Biostatistics Working Papers

We propose, develop and implement the simulation extrapolation (SIMEX) methodology for Cox regression models when the log hazard function is linear in the model parameters but nonlinear in the variables measured with error (LPNE). The class of LPNE functions contains but is not limited to strata indicators, splines, quadratic and interaction terms. The first order bias correction method proposed here has the advantage that it remains computationally feasible even when the number of observations is very large and multiple models need to be explored. Theoretical and simulation results show that the SIMEX method outperforms the naive method even with small …


Covariate Specific Roc Curve With Survival Outcome, Xiao Song, Xiao-Hua Zhou Sep 2006

Covariate Specific Roc Curve With Survival Outcome, Xiao Song, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

The receiver operating characteristic (ROC) curve has been extended to survival data recently, including the nonparametric approach by Heagerty, Lumley and Pepe (2000) and the semiparametric approach by Heagerty and Zheng (2005) using standard survival analysis techniques based on two different time-dependent ROC curve definitions. However, both approaches cannot adjust for the effect of covariates on the accuracy of the biomarker. To account for the covariate effect, we propose semiparametric models for covariate specific ROC curves corresponding to the two time-dependent ROC curve definitions, respectively. We show that the estimators are consistent and converge to Gaussian processes. In the case …


Spatial Cluster Detection For Censored Outcome Data, Andrea J. Cook, Diane Gold, Yi Li Sep 2006

Spatial Cluster Detection For Censored Outcome Data, Andrea J. Cook, Diane Gold, Yi Li

Harvard University Biostatistics Working Paper Series

No abstract provided.


Diagnosing Bias In The Inverse Probability Of Treatment Weighted Estimator Resulting From Violation Of Experimental Treatment Assignment, Yue Wang, Maya L. Petersen, David Bangsberg, Mark J. Van Der Laan Sep 2006

Diagnosing Bias In The Inverse Probability Of Treatment Weighted Estimator Resulting From Violation Of Experimental Treatment Assignment, Yue Wang, Maya L. Petersen, David Bangsberg, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Inverse probability of treatment weighting (IPTW) is frequently used to estimate the causal effects of treatments and interventions. The consistency of the IPTW estimator relies not only on the well-recognized assumption of no unmeasured confounders (Sequential Randomization Assumption or SRA), but also on the assumption of experimentation in the assignment of treatment (Experimental Treatment Assignment or ETA). In finite samples, violations in the ETA assumption can occur due simply to chance; certain treatments become rare or non-existent for certain strata of the population. Such practical violations of the ETA assumption occur frequently in real data, and can result in significant …


Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan Sep 2006

Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSMs) allow one to form causal inferences from data, by specifying a relationship between a treatment and the marginal distribution of a corresponding counterfactual outcome. Following their introduction in Robins (1997), MSMs have typically been fit after assuming a semiparametric model, and then estimating a finite dimensional parameter. van der Laan and Dudoit (2003) proposed to instead view MSM fitting not as a task of semiparametric parameter estimation, but of nonparametric function approximation. They introduced a class of causal effect estimators based on mapping loss functions suitable for the unavailable counterfactual data to those suitable for the …


Conditional Likelihood Methods For Haplotype-Based Association Analysis Using Matched Case-Control Data, Jinbo Chen, Carmen Rodriguez Sep 2006

Conditional Likelihood Methods For Haplotype-Based Association Analysis Using Matched Case-Control Data, Jinbo Chen, Carmen Rodriguez

UPenn Biostatistics Working Papers

Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, i.e., combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using SNP genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiquity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each …


Generalized Confidence Intervals For The Ratio Or Difference Of Two Means For Lognormal Populations With Zeros, Yea-Hung Chen, Xiao-Hua Zhou Sep 2006

Generalized Confidence Intervals For The Ratio Or Difference Of Two Means For Lognormal Populations With Zeros, Yea-Hung Chen, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

We discuss in this article methods for analyzing lognormal data that may include zeros. Specifically, we are interested in interval estimation for the ratio or difference of the population means. We propose here two generalized pivotal (GP) approaches: a ``true'' GP method and an ``approximate'' GP method. Additionally, we propose two likelihood-based approaches: a signed log-likelihood ratio (SLLR) method and a modified SLLR method. Our simulation studies suggest that the approximate generalized pivotal approach outperforms all other known methods; it results in highly accurate coverage frequencies and fairly low bias, even in small sample settings.


Multiple Imputation - Review Of Theory, Implementation And Software, Ofer Harel, Xiao-Hua Zhou Sep 2006

Multiple Imputation - Review Of Theory, Implementation And Software, Ofer Harel, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

Missing data is a common complication in data analysis. In many medical settings missing data can cause difficulties in estimation, precision and inference. Multiple imputation (MI) \cite{Rubin87} is a simulation based approach to deal with incomplete data. Although there are many different methods to deal with incomplete data, MI has become one of the leading methods. Since the late 80's we observed a constant increase in the use and publication of MI related research. This tutorial does not attempt to cover all the material concerning MI, but rather provides an overview and combines together the theory behind MI, the implementation …


Multiple Imputation For The Comparison Of Two Screening Tests In Two-Phase Alzheimer Studies, Ofer Harel, Xiao-Hua Zhou Sep 2006

Multiple Imputation For The Comparison Of Two Screening Tests In Two-Phase Alzheimer Studies, Ofer Harel, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

Two-phase designs are common in epidemiological studies of dementia, and especially in Alzheimer research. In the first phase, all subjects are screened using a common screening test(s), while in the second phase, only a subset of these subjects is tested using a more definitive verification assessment, i.e. golden standard test. When comparing the accuracy of two screening tests in a two-phase study of dementia, inferences are commonly made using only the verified sample. It is well documented that in that case, there is a risk for bias, called verification bias. When the two screening tests have only two values (e.g. …