Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

2005

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 32

Full-Text Articles in Statistical Models

Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng Dec 2005

Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng

Harvard University Biostatistics Working Paper Series

The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and …


On The Use Of Non-Euclidean Isotropy In Geostatistics, Frank C. Curriero Dec 2005

On The Use Of Non-Euclidean Isotropy In Geostatistics, Frank C. Curriero

Johns Hopkins University, Dept. of Biostatistics Working Papers

This paper investigates the use of non-Euclidean distances to characterize isotropic spatial dependence for geostatistical related applications. A simple example is provided to demonstrate there are no guarantees that existing covariogram and variogram functions remain valid (i.e.\ positive definite or conditionally negative definite) when used with a non-Euclidean distance measure. Furthermore, satisfying the conditions of a metric is not sufficient to ensure the distance measure can be used with existing functions. Current literature is not clear on these topics. There are certain distance measures that when used with existing covariogram and variogram functions remain valid, an issue that is explored. …


Casual Mediation Analyses With Structural Mean Models, Thomas R. Tenhave, Marshall Joffe, Kevin Lynch, Greg Brown, Stephen Maisto Nov 2005

Casual Mediation Analyses With Structural Mean Models, Thomas R. Tenhave, Marshall Joffe, Kevin Lynch, Greg Brown, Stephen Maisto

UPenn Biostatistics Working Papers

We represent a linear structural mean model (SMM)approach for analyzing mediation of a randomized baseline intervention's effect on a univariate follow-up outcome. Unlike standard mediation analyses, our approach does not assume that the mediating factor is randomly assigned to individuals (i.e., sequential ignorability). Hence, a comparison of the results of the proposed and standard approaches in with respect to mediation offers a sensitivity analyses of the sequential ignorability assumption. The G-estimation procedure for the proposed SMM represents an extension of the work on direct effects of randomized treatment effects for survival outcomes by Robins and Greenland (1994) (Section 5.0 and …


Model Evaluation Based On The Distribution Of Estimated Absolute Prediction Error, Lu Tian, Tianxi Cai, Els Goetghebeur, L. J. Wei Nov 2005

Model Evaluation Based On The Distribution Of Estimated Absolute Prediction Error, Lu Tian, Tianxi Cai, Els Goetghebeur, L. J. Wei

Harvard University Biostatistics Working Paper Series

The construction of a reliable, practically useful prediction rule for future response is heavily dependent on the "adequacy" of the fitted regression model. In this article, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion. This prediction error is easier to interpret than the average squared error and is equivalent to the mis-classification error for the binary outcome. We show that the distributions of the apparent error and its cross-validation counterparts are approximately normal even under a misspecified fitted model. When the prediction rule is …


A Review Of Stata 9.0, Joseph Hilbe Nov 2005

A Review Of Stata 9.0, Joseph Hilbe

Joseph M Hilbe

No abstract provided.


A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit Oct 2005

A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

High-throughput genotyping technologies for single nucleotide polymorphisms (SNP) have enabled the recent completion of the International HapMap Project (Phase I), which has stimulated much interest in studying genome-wide linkage disequilibrium (LD) patterns. Conventional LD measures, such as D' and r-square, are two-point measurements, and their relationship with physical distance is highly noisy. We propose a new LD measure, defined in terms of the correlation coefficient for shared haplotype lengths around two loci, thereby borrowing information from multiple loci. A U-statistic-based estimator of the new LD measure, which takes into consideration the dependence structure of the observed data, is developed and …


Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan Oct 2005

Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at …


Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan Oct 2005

Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases.

This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM …


Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan Oct 2005

Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan

Harvard University Biostatistics Working Paper Series

In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.

A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial …


Is The Number Of Sick Persons In A Cohort Constant Over Time?, Paula Diehr, Ann Derleth, Anne Newman, Liming Cai Oct 2005

Is The Number Of Sick Persons In A Cohort Constant Over Time?, Paula Diehr, Ann Derleth, Anne Newman, Liming Cai

UW Biostatistics Working Paper Series

Objectives: To estimate the number of persons in a cohort who are sick, over time.

Methods: We calculated the number of sick persons in the Cardiovascular Health Study (CHS), a cohort study of older adults followed up to 14 years, using eight definitions of “healthy” and “sick”. We projected the number in each health state over time for a birth cohort.

Results: The number of sick persons in CHS was approximately constant for 14 years, for all definitions of “sick”. The estimated number of sick persons in the birth cohort was approximately constant from ages 55-75, after which it decreased. …


A Pseudolikelihood Approach For Simultaneous Analysis Of Array Comparative Genomic Hybridizations (Acgh), David A. Engler, Gayatry Mohapatra, David N. Louis, Rebecca Betensky Sep 2005

A Pseudolikelihood Approach For Simultaneous Analysis Of Array Comparative Genomic Hybridizations (Acgh), David A. Engler, Gayatry Mohapatra, David N. Louis, Rebecca Betensky

Harvard University Biostatistics Working Paper Series

DNA sequence copy number has been shown to be associated with cancer development and progression. Array-based Comparative Genomic Hybridization (aCGH) is a recent development that seeks to identify the copy number ratio at large numbers of markers across the genome. Due to experimental and biological variations across chromosomes and across hybridizations, current methods are limited to analyses of single chromosomes. We propose a more powerful approach that borrows strength across chromosomes and across hybridizations. We assume a Gaussian mixture model, with a hidden Markov dependence structure, and with random effects to allow for intertumoral variation, as well as intratumoral clonal …


A Nonstationary Negative Binomial Time Series With Time-Dependent Covariates: Enterococcus Counts In Boston Harbor, E. Andres Houseman, Brent Coull, James P. Shine Sep 2005

A Nonstationary Negative Binomial Time Series With Time-Dependent Covariates: Enterococcus Counts In Boston Harbor, E. Andres Houseman, Brent Coull, James P. Shine

Harvard University Biostatistics Working Paper Series

Boston Harbor has had a history of poor water quality, including contamination by enteric pathogens. We conduct a statistical analysis of data collected by the Massachusetts Water Resources Authority (MWRA) between 1996 and 2002 to evaluate the effects of court-mandated improvements in sewage treatment. Motivated by the ineffectiveness of standard Poisson mixture models and their zero-inflated counterparts, we propose a new negative binomial model for time series of Enterococcus counts in Boston Harbor, where nonstationarity and autocorrelation are modeled using a nonparametric smooth function of time in the predictor. Without further restrictions, this function is not identifiable in the presence …


Cross-Validated Bagged Prediction Of Survival, Sandra E. Sinisi, Romain Neugebauer, Mark J. Van Der Laan Sep 2005

Cross-Validated Bagged Prediction Of Survival, Sandra E. Sinisi, Romain Neugebauer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this article, we show how to apply our previously proposed Deletion/Substitution/Addition algorithm in the context of right-censoring for the prediction of survival. Furthermore, we introduce how to incorporate bagging into the algorithm to obtain a cross-validated bagged estimator. The method is used for predicting the survival time of patients with diffuse large B-cell lymphoma based on gene expression variables.


Semiparametric Normal Transformation Models For Spatially Correlated Survival Data, Yi Li, Xihong Lin Sep 2005

Semiparametric Normal Transformation Models For Spatially Correlated Survival Data, Yi Li, Xihong Lin

Harvard University Biostatistics Working Paper Series

There is an emerging interest in modeling spatially correlated survival data in biomedical and epidemiological studies. In this paper, we propose a new class of semiparametric normal transformation models for right censored spatially correlated survival data. This class of models assumes that survival outcomes marginally follow a Cox proportional hazard model with unspecified baseline hazard, and their joint distribution is obtained by transforming survival outcomes to normal random variables, whose joint distribution is assumed to be multivariate normal with a spatial correlation structure. A key feature of the class of semiparametric normal transformation models is that it provides a rich …


Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen Aug 2005

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders for treatment and the intermediate variable, Robins & Greenland (1992) define an individual direct effect as the counterfactual effect of …


Survival Point Estimate Prediction In Matched And Non-Matched Case-Control Subsample Designed Studies, Annette M. Molinaro, Mark J. Van Der Laan, Dan H. Moore, Karla Kerlikowske Aug 2005

Survival Point Estimate Prediction In Matched And Non-Matched Case-Control Subsample Designed Studies, Annette M. Molinaro, Mark J. Van Der Laan, Dan H. Moore, Karla Kerlikowske

U.C. Berkeley Division of Biostatistics Working Paper Series

Providing information about the risk of disease and clinical factors that may increase or decrease a patient's risk of disease is standard medical practice. Although case-control studies can provide evidence of strong associations between diseases and risk factors, clinicians need to be able to communicate to patients the age-specific risks of disease over a defined time interval for a set of risk factors.

An estimate of absolute risk cannot be determined from case-control studies because cases are generally chosen from a population whose size is not known (necessary for calculation of absolute risk) and where duration of follow-up is not …


Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan Aug 2005

Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing multiple hypotheses is important in high-dimensional biological studies. In these situations, one is often interested in controlling the Type-I error rate, such as the proportion of false positives to total rejections (TPPFP) at a specific level, alpha. This article will present an application of the E-Bayes/Bootstrap TPPFP procedure, presented in van der Laan et al. (2005), which controls the tail probability of the proportion of false positives (TPPFP), on two biological datasets. The two data applications include firstly, the application to a mass-spectrometry dataset of two leukemia subtypes, AML and ALL. The protein data measurements include intensity and …


Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan Aug 2005

Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We present a cross-validated bagging scheme in the context of partitioning algorithms. To explore the benefits of the various bagging scheme, we compare via simulations the predictive ability of single Classification and Regression (CART) Tree with several previously suggested bagging schemes and with our proposed approach. Additionally, a variable importance measure is explained and illustrated.


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


Does The Effect Of Micronutrient Supplementation On Neonatal Survival Vary With Respect To The Percentiles Of The Birth Weight Distribution?, Francesca Dominici, Scott L. Zeger, Giovanni Parmigiani, Joanne Katz, Parul Christian Jul 2005

Does The Effect Of Micronutrient Supplementation On Neonatal Survival Vary With Respect To The Percentiles Of The Birth Weight Distribution?, Francesca Dominici, Scott L. Zeger, Giovanni Parmigiani, Joanne Katz, Parul Christian

Johns Hopkins University, Dept. of Biostatistics Working Papers

Scientific Background: In developing countries, higher infant mortality is partially caused by poor maternal and fetal nutrition. Clinical trials of micronutrient supplementation are aimed at reducing the risk of infant mortality by increasing birth weight. Because infant mortality is greatest among the low birth weight infants (LBW) (less than or equal to 2500 grams), an effective intervention might be needed to increase birth weight among the smallest babies. Although it has been demonstrated that supplementation increases the birth weight in a trial conducted in Nepal, there is inconclusive evidence that the supplementation improves their survival. It has been hypothesized that …


Linear Regression Of Censored Length-Biased Lifetimes, Ying Qing Chen, Yan Wang Jul 2005

Linear Regression Of Censored Length-Biased Lifetimes, Ying Qing Chen, Yan Wang

UW Biostatistics Working Paper Series

Length-biased lifetimes may be collected in observational studies or sample surveys due to biased sampling scheme. In this article, we use a linear regression model, namely, the accelerated failure time model, for the population lifetime distributions in regression analysis of the length-biased lifetimes. It is discovered that the associated regression parameters are invariant under the length-biased sampling scheme. According to this discovery, we propose the quasi partial score estimating equations to estimate the population regression parameters. The proposed methodologies are evaluated and demonstrated by simulation studies and an application to actual data set.


Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen Jun 2005

Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, Breiman (1996a) introduced bootstrap aggregating (bagging) as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying the estimator to multiple bootstrap samples, and averaging the result across bootstrap samples. In order to deal with the curse of dimensionality, typical practice has been to …


Spatio-Temporal Point Processes: Methods And Applications, Peter J. Diggle Jun 2005

Spatio-Temporal Point Processes: Methods And Applications, Peter J. Diggle

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Model Choice In Time Series Studies Of Air Pollution And Mortality, Roger D. Peng, Francesca Dominici, Thomas A. Louis Jun 2005

Model Choice In Time Series Studies Of Air Pollution And Mortality, Roger D. Peng, Francesca Dominici, Thomas A. Louis

Johns Hopkins University, Dept. of Biostatistics Working Papers

Multi-city time series studies of particulate matter (PM) and mortality and morbidity have provided evidence that daily variation in air pollution levels is associated with daily variation in mortality counts. These findings served as key epidemiological evidence for the recent review of the United States National Ambient Air Quality Standards (NAAQS) for PM. As a result, methodological issues concerning time series analysis of the relation between air pollution and health have attracted the attention of the scientific community and critics have raised concerns about the adequacy of current model formulations. Time series data on pollution and mortality are generally analyzed …


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


The Sensitivity And Specificity Of Markers For Event Times, Tianxi Cai, Margaret S. Pepe, Thomas Lumley, Yingye Zheng, Nancy Swords Jenny Apr 2005

The Sensitivity And Specificity Of Markers For Event Times, Tianxi Cai, Margaret S. Pepe, Thomas Lumley, Yingye Zheng, Nancy Swords Jenny

Harvard University Biostatistics Working Paper Series

No abstract provided.


Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan Apr 2005

Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic models. The methodology is utilized for predicting the survival time of patients suffering from acute myeloid leukemia based on clinical and genetic covariates. Furthermore, we compare the diagnostic capabilities of the proposed censored data random forest and boosting methods applied to the recurrence free survival time of node positive breast cancer patients with previously published findings.


Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton Jan 2005

Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton

UW Biostatistics Working Paper Series

No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields …


Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models, Larry Leon, Tianxi Cai, L. J. Wei Jan 2005

Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models, Larry Leon, Tianxi Cai, L. J. Wei

Harvard University Biostatistics Working Paper Series

Various inference procedures for linear regression models with censored failure times have been studied extensively. Recent developments on efficient algorithms to implement these procedures enhance the practical usage of such models in survival analysis. In this article, we present robust inferences for certain covariate effects on the failure time in the presence of "nuisance" confounders under a semiparametric, partial linear regression setting. Specifically, the estimation procedures for the regression coefficients of interest are derived from a working linear model and are valid even when the function of the confounders in the model is not correctly specified. The new proposals are …