Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng Dec 2005

Model Checking For Roc Regression Analysis, Tianxi Cai, Yingye Zheng

Harvard University Biostatistics Working Paper Series

The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and …


Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan Oct 2005

Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at …


Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan Oct 2005

Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases.

This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM …


Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan Oct 2005

Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan

Harvard University Biostatistics Working Paper Series

In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.

A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial …


Is The Number Of Sick Persons In A Cohort Constant Over Time?, Paula Diehr, Ann Derleth, Anne Newman, Liming Cai Oct 2005

Is The Number Of Sick Persons In A Cohort Constant Over Time?, Paula Diehr, Ann Derleth, Anne Newman, Liming Cai

UW Biostatistics Working Paper Series

Objectives: To estimate the number of persons in a cohort who are sick, over time.

Methods: We calculated the number of sick persons in the Cardiovascular Health Study (CHS), a cohort study of older adults followed up to 14 years, using eight definitions of “healthy” and “sick”. We projected the number in each health state over time for a birth cohort.

Results: The number of sick persons in CHS was approximately constant for 14 years, for all definitions of “sick”. The estimated number of sick persons in the birth cohort was approximately constant from ages 55-75, after which it decreased. …


A Nonstationary Negative Binomial Time Series With Time-Dependent Covariates: Enterococcus Counts In Boston Harbor, E. Andres Houseman, Brent Coull, James P. Shine Sep 2005

A Nonstationary Negative Binomial Time Series With Time-Dependent Covariates: Enterococcus Counts In Boston Harbor, E. Andres Houseman, Brent Coull, James P. Shine

Harvard University Biostatistics Working Paper Series

Boston Harbor has had a history of poor water quality, including contamination by enteric pathogens. We conduct a statistical analysis of data collected by the Massachusetts Water Resources Authority (MWRA) between 1996 and 2002 to evaluate the effects of court-mandated improvements in sewage treatment. Motivated by the ineffectiveness of standard Poisson mixture models and their zero-inflated counterparts, we propose a new negative binomial model for time series of Enterococcus counts in Boston Harbor, where nonstationarity and autocorrelation are modeled using a nonparametric smooth function of time in the predictor. Without further restrictions, this function is not identifiable in the presence …


Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen Aug 2005

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders for treatment and the intermediate variable, Robins & Greenland (1992) define an individual direct effect as the counterfactual effect of …


Attributable Risk Function In The Proportional Hazards Model, Ying Qing Chen, Chengcheng Hu, Yan Wang May 2005

Attributable Risk Function In The Proportional Hazards Model, Ying Qing Chen, Chengcheng Hu, Yan Wang

UW Biostatistics Working Paper Series

As an epidemiological parameter, the population attributable fraction is an important measure to quantify the public health attributable risk of an exposure to morbidity and mortality. In this article, we extend this parameter to the attributable fraction function in survival analysis of time-to-event outcomes, and further establish its estimation and inference procedures based on the widely used proportional hazards models. Numerical examples and simulations studies are presented to validate and demonstrate the proposed methods.


Prognosis Of Stage Ii Colon Cancer By Non-Neoplastic Mucosa Gene Expresssion Profiling, Alain Barrier, Sandrine Dudoit, Et Al. May 2005

Prognosis Of Stage Ii Colon Cancer By Non-Neoplastic Mucosa Gene Expresssion Profiling, Alain Barrier, Sandrine Dudoit, Et Al.

U.C. Berkeley Division of Biostatistics Working Paper Series

Aims. This study assessed the possibility to build a prognosis predictor, based on non-neoplastic mucosa microarray gene expression measures, in stage II colon cancer patients. Materials and Methods. Non-neoplastic colonic mucosa mRNA samples from 24 patients (10 with a metachronous metastasis, 14 with no recurrence) were profiled using the Affymetrix HGU133A GeneChip. The k-nearest neighbor method was used for prognosis prediction using microarray gene expression measures. Leave-one-out cross-validation was used to select the number of neighbors and number of informative genes to include in the predictor. Based on this information, a prognosis predictor was proposed and its accuracy estimated by …


Colon Cancer Prognosis Prediction By Gene Expression Profiling, Alain Barrier, Sandrine Dudoit, Et Al. May 2005

Colon Cancer Prognosis Prediction By Gene Expression Profiling, Alain Barrier, Sandrine Dudoit, Et Al.

U.C. Berkeley Division of Biostatistics Working Paper Series

Aims. This study assessed the possibility to build a prognosis predictor, based on microarray gene expression measures, in stage II and III colon cancer patients. Materials and Methods. Tumour (T) and non-neoplastic mucosa (NM) mRNA samples from 18 patients (9 with a recurrence, 9 with no recurrence) were profiled using the Affymetrix HGU133A GeneChip. The k-nearest neighbour method was used for prognosis prediction using T and NM gene expression measures. Six-fold cross-validation was applied to select the number of neighbours and the number of informative genes to include in the predictors. Based on this information, one T-based and one NM-based …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


The Sensitivity And Specificity Of Markers For Event Times, Tianxi Cai, Margaret S. Pepe, Thomas Lumley, Yingye Zheng, Nancy Swords Jenny Apr 2005

The Sensitivity And Specificity Of Markers For Event Times, Tianxi Cai, Margaret S. Pepe, Thomas Lumley, Yingye Zheng, Nancy Swords Jenny

Harvard University Biostatistics Working Paper Series

No abstract provided.


Insights Into Latent Class Analysis, Margaret S. Pepe, Holly Janes Jan 2005

Insights Into Latent Class Analysis, Margaret S. Pepe, Holly Janes

UW Biostatistics Working Paper Series

Latent class analysis is a popular statistical technique for estimating disease prevalence and test sensitivity and specificity. It is used when a gold standard assessment of disease is not available but results of multiple imperfect tests are. We derive analytic expressions for the parameter estimates in terms of the raw data, under the conditional independence assumption. These expressions indicate explicitly how observed two- and three-way associations between test results are used to infer disease prevalence and test operating characteristics. Although reasonable if the conditional independence model holds, the estimators have no basis when it fails. We therefore caution against using …


Standardizing Markers To Evaluate And Compare Their Performances, Margaret S. Pepe, Gary M. Longton Jan 2005

Standardizing Markers To Evaluate And Compare Their Performances, Margaret S. Pepe, Gary M. Longton

UW Biostatistics Working Paper Series

Introduction: Markers that purport to distinguish subjects with a condition from those without a condition must be evaluated rigorously for their classification accuracy. A single approach to statistically evaluating and comparing markers is not yet established.

Methods: We suggest a standardization that uses the marker distribution in unaffected subjects as a reference. For an affected subject with marker value Y, the standardized placement value is the proportion of unaffected subjects with marker values that exceed Y.

Results: We apply the standardization to two illustrative datasets. In patients with pancreatic cancer placement values calculated for the CA 19-9 marker are smaller …


Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton Jan 2005

Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton

UW Biostatistics Working Paper Series

No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields …