Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2005

COBRA

Discipline
Keyword
Publication

Articles 1 - 30 of 101

Full-Text Articles in Physical Sciences and Mathematics

Semiparametric Approaches For Joint Modeling Of Longitudinal And Survival Data With Time Varying Coefficients, Xiao Song, C.Y. Wang Dec 2005

Semiparametric Approaches For Joint Modeling Of Longitudinal And Survival Data With Time Varying Coefficients, Xiao Song, C.Y. Wang

UW Biostatistics Working Paper Series

We study joint modeling of survival and longitudinal data. There are two regression models of interest. The primary model is for survival outcomes, which are assumed to follow a time varying coefficient proportional hazards model. The second model is for longitudinal data, which are assumed to follow a random effects model. Based on the trajectory of a subject's longitudinal data, some covariates in the survival model are functions of the unobserved random effects. Estimated random effects are generally different from the unobserved random effects and hence this leads to covariate measurement error. To deal with covariate measurement error, we propose …


Alleviating Linear Ecological Bias And Optimal Design With Subsample Data, Adam Glynn, Jon Wakefield, Mark Handcock, Thomas Richardson Dec 2005

Alleviating Linear Ecological Bias And Optimal Design With Subsample Data, Adam Glynn, Jon Wakefield, Mark Handcock, Thomas Richardson

UW Biostatistics Working Paper Series

In this paper, we illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides three main benefits. First, by including the individual level subsample data, the biases associated with linear ecological inference can be eliminated. Second, by supplementing the subsample data with ecological data, the information about parameters will be increased. Third, we can use readily available ecological data to design optimal subsampling schemes, so as to further increase the information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree …


Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden Dec 2005

Bayesian Analysis Of Cell-Cycle Gene Expression Data, Chuan Zhou, Jon Wakefield, Linda Breeden

UW Biostatistics Working Paper Series

The study of the cell-cycle is important in order to aid in our understanding of the basic mechanisms of life, yet progress has been slow due to the complexity of the process and our lack of ability to study it at high resolution. Recent advances in microarray technology have enabled scientists to study the gene expression at the genome-scale with a manageable cost, and there has been an increasing effort to identify cell-cycle regulated genes. In this chapter, we discuss the analysis of cell-cycle gene expression data, focusing on a model-based Bayesian approaches. The majority of the models we describe …


Empirical Likelihood Inference For The Area Under The Roc Curve, Gengsheng Qin, Xiao-Hua Zhou Dec 2005

Empirical Likelihood Inference For The Area Under The Roc Curve, Gengsheng Qin, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

For a continuous-scale diagnostic test, the most commonly used summary index of the receiver operating characteristic (ROC) curve is the area under the curve (AUC) that measures the accuracy of the diagnostic test. In this paper we propose an empirical likelihood approach for the inference of AUC. We first define an empirical likelihood ratio for AUC and show that its limiting distribution is a scaled chi-square distribution. We then obtain an empirical likelihood based confidence interval for AUC using the scaled chi-square distribution. This empirical likelihood inference for AUC can be extended to stratified samples and the resulting limiting distribution …


Interval Estimation For The Ratio And Difference Of Two Lognormal Means, Yea-Hung Chen, Xiao-Hua Zhou Dec 2005

Interval Estimation For The Ratio And Difference Of Two Lognormal Means, Yea-Hung Chen, Xiao-Hua Zhou

UW Biostatistics Working Paper Series

Health research often gives rise to data that follow lognormal distributions. In two sample situations, researchers are likely to be interested in estimating the difference or ratio of the population means. Several methods have been proposed for providing confidence intervals for these parameters. However, it is not clear which techniques are most appropriate, or how their performance might vary. Additionally, methods for the difference of means have not been adequately explored. We discuss in the present article five methods of analysis. These include two methods based on the log-likelihood ratio statistic and a generalized pivotal approach. Additionally, we provide and …


Inferences In Censored Cost Regression Models With Empirical Likelihood, Xiao-Hua Zhou, Gengsheng Qin, Huazhen Lin, Gang Li Dec 2005

Inferences In Censored Cost Regression Models With Empirical Likelihood, Xiao-Hua Zhou, Gengsheng Qin, Huazhen Lin, Gang Li

UW Biostatistics Working Paper Series

In many studies of health economics, we are interested in the expected total cost over a certain period for a patient with given characteristics. Problems can arise if cost estimation models do not account for distributional aspects of costs. Two such problems are 1) the skewed nature of the data and 2) censored observations. In this paper we propose an empirical likelihood (EL) method for constructing a confidence region for the vector of regression parameters and a confidence interval for the expected total cost of a patient with the given covariates. We show that this new method has good theoretical …


Confidence Intervals For Predictive Values Using Data From A Case Control Study, Nathaniel David Mercaldo, Xiao-Hua Zhou, Kit F. Lau Dec 2005

Confidence Intervals For Predictive Values Using Data From A Case Control Study, Nathaniel David Mercaldo, Xiao-Hua Zhou, Kit F. Lau

UW Biostatistics Working Paper Series

The accuracy of a binary-scale diagnostic test can be represented by sensitivity (Se), specificity (Sp) and positive and negative predictive values (PPV and NPV). Although Se and Sp measure the intrinsic accuracy of a diagnostic test that does not depend on the prevalence rate, they do not provide information on the diagnostic accuracy of a particular patient. To obtain this information we need to use PPV and NPV. Since PPV and NPV are functions of both the intrinsic accuracy and the prevalence of the disease, constructing confidence intervals for PPV and NPV for a particular patient in a population with …


Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng Dec 2005

Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng

Harvard University Biostatistics Working Paper Series

The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and …


On The Use Of Non-Euclidean Isotropy In Geostatistics, Frank C. Curriero Dec 2005

On The Use Of Non-Euclidean Isotropy In Geostatistics, Frank C. Curriero

Johns Hopkins University, Dept. of Biostatistics Working Papers

This paper investigates the use of non-Euclidean distances to characterize isotropic spatial dependence for geostatistical related applications. A simple example is provided to demonstrate there are no guarantees that existing covariogram and variogram functions remain valid (i.e.\ positive definite or conditionally negative definite) when used with a non-Euclidean distance measure. Furthermore, satisfying the conditions of a metric is not sufficient to ensure the distance measure can be used with existing functions. Current literature is not clear on these topics. There are certain distance measures that when used with existing covariogram and variogram functions remain valid, an issue that is explored. …


Gradient Directed Regularization For Sparse Gaussian Concentration Graphs, With Applications To Inference Of Genetic Networks, Hongzhe Li, Jiang Gui Dec 2005

Gradient Directed Regularization For Sparse Gaussian Concentration Graphs, With Applications To Inference Of Genetic Networks, Hongzhe Li, Jiang Gui

UPenn Biostatistics Working Papers

Large-scale microarray gene expression data provide the possibility of constructing genetic networks or biological pathways. Gaussian graphical models have been suggested to provide an effective method for constructing such genetic networks. However, most of the available methods for constructing Gaussian graphs do not account for the sparsity of the networks and are computationally more demanding or infeasible, especially in the settings of high-dimension and low sample size. We introduce a threshold gradient descent regularization procedure for estimating the sparse precision matrix in the setting of Gaussian graphical models and demonstrate its application to identifying genetic networks. Such a procedure is …


Issues Of Processing And Multiple Testing Of Seldi-Tof Ms Proteomic Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan, Christine F. Skibola, Christine M. Hegedus, Martyn T. Smith Dec 2005

Issues Of Processing And Multiple Testing Of Seldi-Tof Ms Proteomic Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan, Christine F. Skibola, Christine M. Hegedus, Martyn T. Smith

U.C. Berkeley Division of Biostatistics Working Paper Series

A new data filtering method for SELDI-TOF MS proteomic spectra data is described. We examined technical repeats (2 per subject) of intensity versus m/z (mass/charge) of bone marrow cell lysate for two groups of childhood leukemia patients: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). As others have noted, the type of data processing as well as experimental variability can have a disproportionate impact on the list of "interesting" proteins (see Baggerly et al. (2004)). We propose a list of processing and multiple testing techniques to correct for 1) background drift; 2) filtering using smooth regression and cross-validated bandwidth …


Quantile-Function Based Null Distribution In Resampling Based Multiple Testing, Mark J. Van Der Laan, Alan E. Hubbard Nov 2005

Quantile-Function Based Null Distribution In Resampling Based Multiple Testing, Mark J. Van Der Laan, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. Methods based on marginal null distributions (i.e., marginal p-values) are attractive since the marginal p-values can be based on a user supplied choice of marginal null distributions and they are computationally trivial, but they, by necessity, are known to either be conservative or to rely on assumptions about the dependence structure between the test-statistics. Resampling based multiple testing (Westfall and Young, 1993) involves sampling from a joint null …


Data Adaptive Pathway Testing, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan Nov 2005

Data Adaptive Pathway Testing, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A majority of diseases are caused by a combination of factors, for example, composite genetic mutation profiles have been found in many cases to predict a deleterious outcome. There are several statistical techniques that have been used to analyze these types of biological data. This article implements a general strategy which uses data adaptive regression methods to build a specific pathway model, thus predicting a disease outcome by a combination of biological factors and assesses the significance of this model, or pathway, by using a permutation based null distribution. We also provide several simulation comparisons with other techniques. In addition, …


Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey Nov 2005

Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

Nearest centroid classifiers have recently been successfully employed in high-dimensional applications. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is typically carried out by computing univariate statistics for each feature individually, without consideration for how a subset of features performs as a whole. For subsets of a given size, we characterize the optimal choice of features, corresponding to those yielding the smallest misclassification rate. Furthermore, we propose an algorithm for estimating this optimal subset in practice. Finally, we investigate the applicability of shrinkage ideas to nearest centroid classifiers. We use gene-expression microarrays for …


A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey Nov 2005

A New Approach To Intensity-Dependent Normalization Of Two-Channel Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

A two-channel microarray measures the relative expression levels of thousands of genes from a pair of biological samples. In order to reliably compare gene expression levels between and within arrays, it is necessary to remove systematic errors that distort the biological signal of interest. The standard for accomplishing this is smoothing "MA-plots" to remove intensity-dependent dye bias and array-specific effects. However, MA methods require strong assumptions. We review these assumptions and derive several practical scenarios in which they fail. The "dye-swap" normalization method has been much less frequently used because it requires two arrays per pair of samples. We show …


Nonparametric Estimation Of Bivariate Failure Time Associations In The Presence Of A Competing Risk, Karen Bandeen-Roche, Jing Ning Nov 2005

Nonparametric Estimation Of Bivariate Failure Time Associations In The Presence Of A Competing Risk, Karen Bandeen-Roche, Jing Ning

Johns Hopkins University, Dept. of Biostatistics Working Papers

There has been much research on the study of associations among paired failure times. Most has either assumed time invariance of association or been based on complex measures or estimators. Little has accommodated failures arising amid competing risks. This paper targets the conditional cause specific hazard ratio, a recent modification of the conditional hazard ratio to accommodate competing risks data. Estimation is accomplished by an intuitive, nonparametric method that localizes Kendall’s tau. Time variance is accommodated through a partitioning of space into “bins” between which the strength of association may differ. Inferential procedures are researched, small sample performance evaluated, and …


Correspondences Between Regression Models For Complex Binary Outcomes And Those For Structured Multivariate Survival Analyses, Nicholas P. Jewell Nov 2005

Correspondences Between Regression Models For Complex Binary Outcomes And Those For Structured Multivariate Survival Analyses, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

Doksum and Gasko [5] described a one-to-one correspondence between regression models for binary outcomes and those for continuous time survival analyses. This correspondence has been exploited heavily in the analysis of current status data (Jewell and van der Laan [11], Shiboski [18]). Here, we explore similar correspondences for complex survival models and categorical regression models for polytomous data. We include discussion of competing risks and progressive multi-state survival random variables.


Application Of A Variable Importance Measure Method To Hiv-1 Sequence Data, Merrill D. Birkner, Mark J. Van Der Laan Nov 2005

Application Of A Variable Importance Measure Method To Hiv-1 Sequence Data, Merrill D. Birkner, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

van der Laan (2005) proposed a method to construct variable importance measures and provided the respective statistical inference. This technique involves determining the importance of a variable in predicting an outcome. This method can be applied as an inverse probability of treatment weighted (IPTW) or double robust inverse probability of treatment weighted (DR-IPTW) estimator. A respective significance of the estimator is determined by estimating the influence curve and hence determining the corresponding variance and p-value. This article applies the van der Laan (2005) variable importance measures and corresponding inference to HIV-1 sequence data. In this data application, protease and reverse …


Casual Mediation Analyses With Structural Mean Models, Thomas R. Tenhave, Marshall Joffe, Kevin Lynch, Greg Brown, Stephen Maisto Nov 2005

Casual Mediation Analyses With Structural Mean Models, Thomas R. Tenhave, Marshall Joffe, Kevin Lynch, Greg Brown, Stephen Maisto

UPenn Biostatistics Working Papers

We represent a linear structural mean model (SMM)approach for analyzing mediation of a randomized baseline intervention's effect on a univariate follow-up outcome. Unlike standard mediation analyses, our approach does not assume that the mediating factor is randomly assigned to individuals (i.e., sequential ignorability). Hence, a comparison of the results of the proposed and standard approaches in with respect to mediation offers a sensitivity analyses of the sequential ignorability assumption. The G-estimation procedure for the proposed SMM represents an extension of the work on direct effects of randomized treatment effects for survival outcomes by Robins and Greenland (1994) (Section 5.0 and …


A General Imputation Methodology For Nonparametric Regression With Censored Data, Dan Rubin, Mark J. Van Der Laan Nov 2005

A General Imputation Methodology For Nonparametric Regression With Censored Data, Dan Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider the random design nonparametric regression problem when the response variable is subject to a general mode of missingness or censoring. A traditional approach to such problems is imputation, in which the missing or censored responses are replaced by well-chosen values, and then the resulting covariate/response data are plugged into algorithms designed for the uncensored setting. We present a general methodology for imputation with the property of double robustness, in that the method works well if either a parameter of the full data distribution (covariate and response distribution) or a parameter of the censoring mechanism is well approximated. These …


Estimating A Treatment Effect With Repeated Measurements Accounting For Varying Effectiveness Duration, Ying Qing Chen, Jingrong Yang, Su-Chun Cheng Nov 2005

Estimating A Treatment Effect With Repeated Measurements Accounting For Varying Effectiveness Duration, Ying Qing Chen, Jingrong Yang, Su-Chun Cheng

UW Biostatistics Working Paper Series

To assess treatment efficacy in clinical trials, certain clinical outcomes are repeatedly measured for same subject over time. They can be regarded as function of time. The difference in their mean functions between the treatment arms usually characterises a treatment effect. Due to the potential existence of subject-specific treatment effectiveness lag and saturation times, erosion of treatment effect in the difference may occur during the observation period of time. Instead of using ad hoc parametric or purely nonparametric time-varying coefficients in statistical modeling, we first propose to model the treatment effectiveness durations, which are the varying time intervals between the …


Modeling Differentiated Treatment Effects For Multiple Outcomes Data, Hongfei Guo, Karen Bandeen-Roche Nov 2005

Modeling Differentiated Treatment Effects For Multiple Outcomes Data, Hongfei Guo, Karen Bandeen-Roche

Johns Hopkins University, Dept. of Biostatistics Working Papers

Multiple outcomes data are commonly used to characterize treatment effects in medical research, for instance, multiple symptoms to characterize potential remission of a psychiatric disorder. Often either a global, i.e. symptom-invariant, treatment effect is evaluated. Such a treatment effect may over generalize the effect across the outcomes. On the other hand individual treatment effects, varying across all outcomes, are complicated to interpret, and their estimation may lose precision relative to a global summary. An effective compromise to summarize the treatment effect may be through patterns of the treatment effects, i.e. "differentiated effects." In this paper we propose a two-category model …


Model Evaluation Based On The Distribution Of Estimated Absolute Prediction Error, Lu Tian, Tianxi Cai, Els Goetghebeur, L. J. Wei Nov 2005

Model Evaluation Based On The Distribution Of Estimated Absolute Prediction Error, Lu Tian, Tianxi Cai, Els Goetghebeur, L. J. Wei

Harvard University Biostatistics Working Paper Series

The construction of a reliable, practically useful prediction rule for future response is heavily dependent on the "adequacy" of the fitted regression model. In this article, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion. This prediction error is easier to interpret than the average squared error and is equivalent to the mis-classification error for the binary outcome. We show that the distributions of the apparent error and its cross-validation counterparts are approximately normal even under a misspecified fitted model. When the prediction rule is …


Efficacy Studies Of Malaria Treatments In Africa: Efficient Estimation With Missing Indicators Of Failure, Rhoderick N. Machekano, Grant Dorsey, Alan E. Hubbard Nov 2005

Efficacy Studies Of Malaria Treatments In Africa: Efficient Estimation With Missing Indicators Of Failure, Rhoderick N. Machekano, Grant Dorsey, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Efficacy studies of malaria treatments can be plagued by indeterminate outcomes for some patients. The study motivating this paper defines the outcome of interest (treatment failure) as recrudescence and for some subjects, it is unclear whether a recurrence of malaria is due to that or new infection. This results in a specific kind of missing data. The effect of missing data in causal inference problems is widely recognized. Methods that adjust for possible bias from missing data include a variety of imputation procedures (extreme case analysis, hot-deck, single and multiple imputation), inverse weighting methods, and likelihood based methods (data augmentation, …


Analyzing Panel Count Data With Informative Observation Times, Chiung-Yu Huang, Mei-Cheng Wang, Ying Zhang Oct 2005

Analyzing Panel Count Data With Informative Observation Times, Chiung-Yu Huang, Mei-Cheng Wang, Ying Zhang

Johns Hopkins University, Dept. of Biostatistics Working Papers

In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying recurrent event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the recurrent event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximizing a conditional …


A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit Oct 2005

A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

High-throughput genotyping technologies for single nucleotide polymorphisms (SNP) have enabled the recent completion of the International HapMap Project (Phase I), which has stimulated much interest in studying genome-wide linkage disequilibrium (LD) patterns. Conventional LD measures, such as D' and r-square, are two-point measurements, and their relationship with physical distance is highly noisy. We propose a new LD measure, defined in terms of the correlation coefficient for shared haplotype lengths around two loci, thereby borrowing information from multiple loci. A U-statistic-based estimator of the new LD measure, which takes into consideration the dependence structure of the observed data, is developed and …


Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan Oct 2005

Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at …


Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan Oct 2005

Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases.

This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM …


Designed Extension Of Survival Studies: Application To Clinical Trials With Unrecognized Heterogeneity, Yi Li, Mei-Chiung Shih, Rebecca A. Betensky Oct 2005

Designed Extension Of Survival Studies: Application To Clinical Trials With Unrecognized Heterogeneity, Yi Li, Mei-Chiung Shih, Rebecca A. Betensky

Harvard University Biostatistics Working Paper Series

It is well known that unrecognized heterogeneity among patients, such as is conferred by genetic subtype, can undermine the power of randomized trial, designed under the assumption of homogeneity, to detect a truly beneficial treatment. We consider the conditional power approach to allow for recovery of power under unexplained heterogeneity. While Proschan and Hunsberger (1995) confined the application of conditional power design to normally distributed observations, we consider more general and difficult settings in which the data are in the framework of continuous time and are subject to censoring. In particular, we derive a procedure appropriate for the analysis of …


Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan Oct 2005

Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan

Harvard University Biostatistics Working Paper Series

In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.

A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial …