Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

PDF

COBRA

2006

Discipline
Keyword
Publication

Articles 1 - 30 of 129

Full-Text Articles in Entire DC Network

Lehmann Family Of Roc Curves, Mithat Gonen, Glenn Heller Dec 2006

Lehmann Family Of Roc Curves, Mithat Gonen, Glenn Heller

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Receiver operating characteristic (ROC) curves are useful in evaluating the ability of a continuous marker in discriminating between the two states of a binary outcome such as diseased/not diseased. The most popular parametric model for an ROC curve is the binormal model which assumes that the marker is normally distributed conditional on the outcome. Here we present an alternative to the binormal model based on the Lehmann family, also known as the proportional hazards specification. The resulting ROC curve and its functionals (such as the area under the curve) have simple analytic forms. We derive closed-form expressions for the asymptotic …


Interacting With Local And Remote Data Respositories Using The Stashr Package, Sandrah P. Eckel, Roger Peng Dec 2006

Interacting With Local And Remote Data Respositories Using The Stashr Package, Sandrah P. Eckel, Roger Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

The stashR package (a Set of Tools for Administering SHared Repositories) for R implements a simple key-value style database where character string keys are associated with data values. The key-value databases can be either stored locally on the user's computer or accessed remotely via the Internet. Methods specific to the stashR package allow users to share data repositories or access previously created remote data repositories. In particular, methods are available for the S4 classes localDB and remoteDB to insert, retrieve, or delete data from the database as well as to synchronize local copies of the data to the remote version …


A Likelihood Based Method For Real Time Estimation Of The Serial Interval And Reproductive Number Of An Epidemic, Laura Forsberg White, Marcello Pagano Dec 2006

A Likelihood Based Method For Real Time Estimation Of The Serial Interval And Reproductive Number Of An Epidemic, Laura Forsberg White, Marcello Pagano

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Semiparametric Approach For The Nonparametric Transformation Survival Model With Multiple Covariates, Xiao Song, Shuangge Ma, Jian Huang, Xiao-Hua Zhou Dec 2006

A Semiparametric Approach For The Nonparametric Transformation Survival Model With Multiple Covariates, Xiao Song, Shuangge Ma, Jian Huang, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

The nonparametric transformation model for survival time that makes no parametric assumptions on both the transformation function and the error is appealing in its flexibility. The nonparametric transformation model makes no assumption on the forms of the transformation function and the error distribution. This model is appealing in its flexibility for modeling censored survival data. Current approaches for estimation of the regression parameters involve maximizing discontinuous objective functions, which are numerically infeasible to implement in the case of multiple covariates. Based on the partial rank estimator (Khan & Tamer, 2004), we propose a smoothed partial rank estimator which maximizes a …


Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman Dec 2006

Use Of Hidden Markov Models For Qtl Mapping, Karl W. Broman

Johns Hopkins University, Dept. of Biostatistics Working Papers

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of …


Gamma Shape Mixtures For Heavy-Tailed Distributions, Sergio Venturini, Francesca Dominici, Giovanni Parmigiani Dec 2006

Gamma Shape Mixtures For Heavy-Tailed Distributions, Sergio Venturini, Francesca Dominici, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

An important question in health services research is the estimation of the proportion of medical expenditures that exceed a given threshold. Typically, medical expenditures present highly skewed, heavy tailed distributions, for which a) simple variable transformations are insufficient to achieve a tractable low- dimensional parametric form and b) nonparametric methods are not efficient in estimating exceedance probabilities for large thresholds. Motivated by this context, in this paper we propose a general Bayesian approach for the estimation of tail probabilities of heavy-tailed distributions,based on a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides …


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

SUMMARY. We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model …


Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan Nov 2006

Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

No abstract provided.


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


Analysis Of Case-Control Age-At-Onset Data Using A Modified Case-Cohort Method, Bin Nan, Xihong Lin Nov 2006

Analysis Of Case-Control Age-At-Onset Data Using A Modified Case-Cohort Method, Bin Nan, Xihong Lin

The University of Michigan Department of Biostatistics Working Paper Series

Case-control designs are widely used in rare disease studies. In a typical case-control study, data are collected from a sample of all available subjects who have experienced a disease (cases) and a sub-sample of subjects who have not experienced the disease (controls) in a study cohort. Cases are often oversampled in case-control studies. Logistic regression is a common tool to estimate the relative risks of the disease and a set of covariates. Very often in such a study, information of ages-at-onset of the disease for all cases and ages at survey of controls are known. Standard logistic regression analysis using …


Smoothed Rank Regression With Censored Data, Glenn Heller Nov 2006

Smoothed Rank Regression With Censored Data, Glenn Heller

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

A weighted rank estimating function is proposed to estimate the regression parameter vector in an accelerated failure time model with right censored data. In general, rank estimating functions are discontinuous in the regression parameter, creating difficulties in determining the asymptotic distribution of the estimator. A local distribution function is used to create a rank based estimating function that is continuous and monotone in the regression parameter vector. A weight is included in the estimating function to produce a bounded influence estimate. The asymptotic distribution of the regression estimator is developed and simulations are performed to examine its finite sample properties. …


Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins Nov 2006

Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins

COBRA Preprint Series

Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. This relationship is considered for both binary and non-binary variables. Counterexamples are provide to show that the results do not hold under less restrictive conditions. The ideas of monotonic effects are furthermore used to relate signed edges on a directed acyclic graph to qualitative effect modification.


Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann Nov 2006

Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann

Johns Hopkins University, Dept. of Biostatistics Working Papers

We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a …


Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch Nov 2006

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …


Doubly Penalized Buckley-James Method For Survival Data With High-Dimensional Covariates, Sijian Wang, Bin Nan, Ji Zhu, David G. Beer Nov 2006

Doubly Penalized Buckley-James Method For Survival Data With High-Dimensional Covariates, Sijian Wang, Bin Nan, Ji Zhu, David G. Beer

The University of Michigan Department of Biostatistics Working Paper Series

Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses a mixture of L1-norm and L2-norm penalties. Similar to the elastic-net method for linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by cross-validation and uniform design. …


A Note On Bias Due To Fitting Prospective Multivariate Generalized Linear Models To Categorical Outcomes Ignoring Retrospective Sampling Schemes, Bhramar Mukherjee, Ivy Liu Nov 2006

A Note On Bias Due To Fitting Prospective Multivariate Generalized Linear Models To Categorical Outcomes Ignoring Retrospective Sampling Schemes, Bhramar Mukherjee, Ivy Liu

The University of Michigan Department of Biostatistics Working Paper Series

Outcome dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the “cases” based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to …


Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency, Bhramar Mukherjee, Nilanjan Chatterjee Nov 2006

Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency, Bhramar Mukherjee, Nilanjan Chatterjee

The University of Michigan Department of Biostatistics Working Paper Series

Standard prospective logistic regression analysis of case-control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern “retrospective” methods, including the “case-only” approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel approach to analyze case-control data that can relax the gene-environment independence assumption using an empirical Bayes framework. In the special case, involving a …


Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley Oct 2006

Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley

UW Biostatistics Working Paper Series

This paper presents large cluster asymptotic results for generalized estimating equations. The complexity of working correlation model is characterized in terms of the number of working correlation components to be estimated. When the cluster size is relatively large, we may encounter a situation where a high-dimensional working correlation matrix is modeled and estimated from the data. In the present asymptotic setting, the cluster size and the complexity of working correlation model grow with the number of independent clusters. We show the existence, weak consistency and asymptotic normality of marginal regression parameter estimators using the results of empirical process theory and …


Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry Oct 2006

Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to …


Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd Oct 2006

Statistical Analysis Of Air Pollution Panel Studies: An Illustration, Holly Janes, Lianne Sheppard, Kristen Shepherd

UW Biostatistics Working Paper Series

The panel study design is commonly used to evaluate the short-term health effects of air pollution. Standard statistical methods for analyzing longitudinal data are available, but the literature reveals that the techniques are not well understood by practitioners. We illustrate these methods using data from the 1999 to 2002 Seattle panel study. Marginal, conditional, and transitional approaches for modeling longitudinal data are reviewed and contrasted with respect to their parameter interpretation and methods for accounting for correlation and dealing with missing data. We also discuss and illustrate techniques for controlling for time-dependent and time-independent confounding, and for exploring and summarizing …


A Comparative Analysis Of The Chronic Effects Of Fine Particulate Matter, Sorina E. Eftim, Holly Janes, Aidan Mcdermott, Jonathan M. Samet, Francesca Dominici Oct 2006

A Comparative Analysis Of The Chronic Effects Of Fine Particulate Matter, Sorina E. Eftim, Holly Janes, Aidan Mcdermott, Jonathan M. Samet, Francesca Dominici

Johns Hopkins University, Dept. of Biostatistics Working Papers

The American Cancer Society study (ACS) and the Harvard Six Cities study (SCS) are the two landmark cohort studies for estimating the chronic effects of fine particulate matter PM2.5 on mortality. To date, no comparative analysis of these studies has been carried out using a different study design, study period, data, and modeling approach. In this paper, we estimate the chronic effects of PM on mortality for the period 2000-2002 by using mortality data from Medicare and \PM levels from the National Air Pollution Monitoring Network for the same counties included in the SCS and the ACS. We use a …


Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg Oct 2006

Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg

Harvard University Biostatistics Working Paper Series

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for …


Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli Oct 2006

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …


Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin Oct 2006

Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one has available an estimate of the density of the data generating distribution such as a maximum likelihood estimator according to a given or data adaptively selected model. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of the density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and …


Crude Cumulative Incidence In The Form Of A Horvitz-Thompson Like And Kaplan-Meier Like Estimator, Laura Antolini, Elia Mario Biganzoli, Patrizia Boracchi Oct 2006

Crude Cumulative Incidence In The Form Of A Horvitz-Thompson Like And Kaplan-Meier Like Estimator, Laura Antolini, Elia Mario Biganzoli, Patrizia Boracchi

COBRA Preprint Series

The link between the nonparametric estimator of the crude cumulative incidence of a competing risk and the Kaplan-Meier estimator is exploited. The equivalence of the nonparametric crude cumulative incidence to an inverse-probability-of-censoring weighted average of the sub-distribution function is proved. The link between the estimation of crude cumulative incidence curves and Gray's family of nonparametric tests is considered. The crude cumulative incidence is proved to be a Kaplan-Meier like estimator based on the sub-distribution hazard, i.e. the quantity on which Gray's family of tests is based. A standard probabilistic formalism is adopted to have a note accessible to applied statisticians.


Biologic Interaction And Their Identification, Tyler J. Vanderweele, James Robins Sep 2006

Biologic Interaction And Their Identification, Tyler J. Vanderweele, James Robins

COBRA Preprint Series

The definitions of a biologic interaction and causal interdependence are reconsidered in light of a sufficient-component cause framework. Various conditions and statistical tests are derived for the presence of biologic interactions. The conditions derived are sufficient but not necessary for the presence of a biologic interaction. Through a series of examples it is made evident that in the context of monotonic effects, but not in general, the conditions which are derived are closely related but not identical to effect modification on the risk difference scale.


A Theory Of Sufficient Cause Interactions, Tyler J. Vanderweele, James M. Robins Sep 2006

A Theory Of Sufficient Cause Interactions, Tyler J. Vanderweele, James M. Robins

COBRA Preprint Series

Sufficient-component causes are discussed within the potential outcome framework so as to formalize notions of sufficient causes, synergism and sufficient cause interactions. Doing so allows for the derivation of counterfactual conditions and statistical tests for detecting the presence of sufficient cause interactions. Under the assumption of monotonic effects, more powerful statistical tests for sufficient cause interactions can be derived. The statistical tests derived for sufficient cause interactions are compared with and contrasted to interaction terms in standard statistical models.


Cox Models With Nonlinear Effect Of Covariates Measured With Error: A Case Study Of Chronic Kidney Disease Incidence, Ciprian M. Crainiceanu, David Ruppert, Josef Coresh Sep 2006

Cox Models With Nonlinear Effect Of Covariates Measured With Error: A Case Study Of Chronic Kidney Disease Incidence, Ciprian M. Crainiceanu, David Ruppert, Josef Coresh

Johns Hopkins University, Dept. of Biostatistics Working Papers

We propose, develop and implement the simulation extrapolation (SIMEX) methodology for Cox regression models when the log hazard function is linear in the model parameters but nonlinear in the variables measured with error (LPNE). The class of LPNE functions contains but is not limited to strata indicators, splines, quadratic and interaction terms. The first order bias correction method proposed here has the advantage that it remains computationally feasible even when the number of observations is very large and multiple models need to be explored. Theoretical and simulation results show that the SIMEX method outperforms the naive method even with small …


Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency , Bhramar Mukherjee Sep 2006

Exploiting Gene-Environment Independence For Analysis Of Case-Control Studies: An Empirical Bayes Approach To Trade Off Between Bias And Efficiency , Bhramar Mukherjee

The University of Michigan Department of Biostatistics Working Paper Series

Standard prospective logistic regression analysis of case-control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, modern ``retrospective'' methods, including the celebrated ``case-only'' approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel approach to analyze case-control data that can relax the gene-environment independence assumption using an empirical Bayes (EB) framework. In the special case, involving a binary gene and a …


Covariate Specific Roc Curve With Survival Outcome, Xiao Song, Xiao-Hua Zhou Sep 2006

Covariate Specific Roc Curve With Survival Outcome, Xiao Song, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

The receiver operating characteristic (ROC) curve has been extended to survival data recently, including the nonparametric approach by Heagerty, Lumley and Pepe (2000) and the semiparametric approach by Heagerty and Zheng (2005) using standard survival analysis techniques based on two different time-dependent ROC curve definitions. However, both approaches cannot adjust for the effect of covariates on the accuracy of the biomarker. To account for the covariate effect, we propose semiparametric models for covariate specific ROC curves corresponding to the two time-dependent ROC curve definitions, respectively. We show that the estimators are consistent and converge to Gaussian processes. In the case …