Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

COBRA

Applied Mathematics

Keyword
Publication Year
Publication

Articles 1 - 29 of 29

Full-Text Articles in Statistical Models

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng Dec 2011

Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel Aug 2009

Shrinkage Estimation Of Expression Fold Change As An Alternative To Testing Hypotheses Of Equivalent Expression, Zahra Montazeri, Corey M. Yanofsky, David R. Bickel

COBRA Preprint Series

Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a …


A Method For Visualizing Multivariate Time Series Data, Roger D. Peng Feb 2008

A Method For Visualizing Multivariate Time Series Data, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a …


Bayesian Analysis For Penalized Spline Regression Using Win Bugs, Ciprian M. Crainiceanu, David Ruppert, M.P. Wand Dec 2007

Bayesian Analysis For Penalized Spline Regression Using Win Bugs, Ciprian M. Crainiceanu, David Ruppert, M.P. Wand

Johns Hopkins University, Dept. of Biostatistics Working Papers

Penalized splines can be viewed as BLUPs in a mixed model framework, which allows the use of mixed model software for smoothing. Thus, software originally developed for Bayesian analysis of mixed models can be used for penalized spline regression. Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS. MCMC mixing is substantially improved from the previous versions by using low{rank thin{plate splines instead of truncated polynomial basis. Simulation time …


Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan Nov 2006

Spatio-Temporal Analysis Of Areal Data And Discovery Of Neighborhood Relationships In Conditionally Autoregressive Models, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

No abstract provided.


Bayesian Smoothing Of Irregularly-Spaced Data Using Fourier Basis Functions, Christopher J. Paciorek Aug 2006

Bayesian Smoothing Of Irregularly-Spaced Data Using Fourier Basis Functions, Christopher J. Paciorek

Harvard University Biostatistics Working Paper Series

No abstract provided.


Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan Oct 2005

Gauss-Seidel Estimation Of Generalized Linear Mixed Models With Application To Poisson Modeling Of Spatially Varying Disease Rates, Subharup Guha, Louise Ryan

Harvard University Biostatistics Working Paper Series

Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases.

This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM …


Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan Oct 2005

Computational Techniques For Spatial Logistic Regression With Large Datasets, Christopher J. Paciorek, Louise Ryan

Harvard University Biostatistics Working Paper Series

In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.

A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial …


Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models, Larry Leon, Tianxi Cai, L. J. Wei Jan 2005

Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models, Larry Leon, Tianxi Cai, L. J. Wei

Harvard University Biostatistics Working Paper Series

Various inference procedures for linear regression models with censored failure times have been studied extensively. Recent developments on efficient algorithms to implement these procedures enhance the practical usage of such models in survival analysis. In this article, we present robust inferences for certain covariate effects on the failure time in the presence of "nuisance" confounders under a semiparametric, partial linear regression setting. Specifically, the estimation procedures for the regression coefficients of interest are derived from a working linear model and are valid even when the function of the confounders in the model is not correctly specified. The new proposals are …


A Hybrid Newton-Type Method For The Linear Regression In Case-Cohort Studies, Menggang Yu, Bin Nan Dec 2004

A Hybrid Newton-Type Method For The Linear Regression In Case-Cohort Studies, Menggang Yu, Bin Nan

The University of Michigan Department of Biostatistics Working Paper Series

Case-cohort designs are increasingly commonly used in large epidemiological cohort studies. Nan, Yu, and Kalbeisch (2004) provided the asymptotic results for censored linear regression models in case-cohort studies. In this article, we consider computational aspects of their proposed rank based estimating methods. We show that the rank based discontinuous estimating functions for case-cohort studies are monotone, a property established for cohort data in the literature, when generalized Gehan type of weights are used. Though the estimating problem can be formulated to a linear programming problem as that for cohort data, due to its easily uncontrollable large scale even for a …


Semiparametric Regression In Capture-Recapture Modelling, O. Gimenez, C. Barbraud, Ciprian M. Crainiceanu, S. Jenouvrier, B.T. Morgan Dec 2004

Semiparametric Regression In Capture-Recapture Modelling, O. Gimenez, C. Barbraud, Ciprian M. Crainiceanu, S. Jenouvrier, B.T. Morgan

Johns Hopkins University, Dept. of Biostatistics Working Papers

Capture-recapture models were developed to estimate survival using data arising from marking and monitoring wild animals over time. Variation in the survival process may be explained by incorporating relevant covariates. We develop nonparametric and semiparametric regression models for estimating survival in capture-recapture models. A fully Bayesian approach using MCMC simulations was employed to estimate the model parameters. The work is illustrated by a study of Snow petrels, in which survival probabilities are expressed as nonlinear functions of a climate covariate, using data from a 40-year study on marked individuals, nesting at Petrels Island, Terre Adelie.


A Bayesian Mixture Model Relating Dose To Critical Organs And Functional Complication In 3d Conformal Radiation Therapy, Tim Johnson, Jeremy Taylor, Randall K. Ten Haken, Avraham Eisbruch Nov 2004

A Bayesian Mixture Model Relating Dose To Critical Organs And Functional Complication In 3d Conformal Radiation Therapy, Tim Johnson, Jeremy Taylor, Randall K. Ten Haken, Avraham Eisbruch

The University of Michigan Department of Biostatistics Working Paper Series

A goal of radiation therapy is to deliver maximum dose to the target tumor while minimizing complications due to irradiation of critical organs. Technological advances in 3D conformal radiation therapy has allowed great strides in realizing this goal, however complications may still arise. Critical organs may be adjacent to tumors or in the path of the radiation beam. Several mathematical models have been proposed that describe a relationship between dose and observed functional complication, however only a few published studies have successfully fit these models to data using modern statistical methods which make efficient use of the data. One complication …


A Bayesian Method For Finding Interactions In Genomic Studies, Wei Chen, Debashis Ghosh, Trivellore E. Raghuanthan, Sharon Kardia Nov 2004

A Bayesian Method For Finding Interactions In Genomic Studies, Wei Chen, Debashis Ghosh, Trivellore E. Raghuanthan, Sharon Kardia

The University of Michigan Department of Biostatistics Working Paper Series

An important step in building a multiple regression model is the selection of predictors. In genomic and epidemiologic studies, datasets with a small sample size and a large number of predictors are common. In such settings, most standard methods for identifying a good subset of predictors are unstable. Furthermore, there is an increasing emphasis towards identification of interactions, which has not been studied much in the statistical literature. We propose a method, called BSI (Bayesian Selection of Interactions), for selecting predictors in a regression setting when the number of predictors is considerably larger than the sample size with a focus …


Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll Nov 2004

Spatially Adaptive Bayesian P-Splines With Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, Raymond J. Carroll

Johns Hopkins University, Dept. of Biostatistics Working Papers

An increasingly popular tool for nonparametric smoothing are penalized splines (P-splines) which use low-rank spline bases to make computations tractable while maintaining accuracy as good as smoothing splines. This paper extends penalized spline methodology by both modeling the variance function nonparametrically and using a spatially adaptive smoothing parameter. These extensions have been studied before, but never together and never in the multivariate case. This combination is needed for satisfactory inference and can be implemented effectively by Bayesian \mbox{MCMC}. The variance process controlling the spatially-adaptive shrinkage of the mean and the variance of the heteroscedastic error process are modeled as log-penalized …


Gllamm Manual, Sophia Rabe-Hesketh, Anders Skrondal, Andrew Pickles Oct 2004

Gllamm Manual, Sophia Rabe-Hesketh, Anders Skrondal, Andrew Pickles

U.C. Berkeley Division of Biostatistics Working Paper Series

This manual describes a Stata program gllamm that can estimate Generalized Linear Latent and Mixed Models (GLLAMMs). GLLAMMs are a class of multilevel latent variable models for (multivariate) responses of mixed type including continuous responses, counts, duration/survival data, dichotomous, ordered and unordered categorical responses and rankings. The latent variables (common factors or random effects) can be assumed to be discrete or to have a multivariate normal distribution. Examples of models in this class are multilevel generalized linear models or generalized linear mixed models, multilevel factor or latent trait models, item response models, latent class models and multilevel structural equation models. …


Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan Oct 2004

Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

An important problem in epidemiology and medical research is the estimation of the causal effect of a treatment action at a single point in time on the mean of an outcome, possibly within strata of the target population defined by a subset of the baseline covariates. Current approaches to this problem are based on marginal structural models, i.e., parametric models for the marginal distribution of counterfactural outcomes as a function of treatment and effect modifiers. The various estimators developed in this context furthermore each depend on a high-dimensional nuisance parameter whose estimation currently also relies on parametric models. Since misspecification …


Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh Oct 2004

Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

One of the benefits of profiling of cancer samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Such subgroups have typically been found in microarray data using hierarchical clustering. A major problem in interpretation of the output is determining the number of clusters. We approach the problem of determining disease subtypes using mixture models. A novel estimation procedure of the parameters in the mixture model is developed based on a combination of random projections and the expectation-maximization algorithm. Because the approach is probabilistic, our approach provides a measure for the number of true clusters …


History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen Sep 2004

History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at a final time point. However, the utility of these models for some applications has been limited by their inability to incorporate modification of the causal effect of treatment by time-varying covariates. …


A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest Aug 2004

A Hierarchical Multivariate Two-Part Model For Profiling Providers' Effects On Healthcare Charges, John W. Robinson, Scott L. Zeger, Christopher B. Forrest

Johns Hopkins University, Dept. of Biostatistics Working Papers

Procedures for analyzing and comparing healthcare providers' effects on health services delivery and outcomes have been referred to as provider profiling. In a typical profiling procedure, patient-level responses are measured for clusters of patients treated by providers that in turn, can be regarded as statistically exchangeable. Thus, a hierarchical model naturally represents the structure of the data. When provider effects on multiple responses are profiled, a multivariate model rather than a series of univariate models, can capture associations among responses at both the provider and patient levels. When responses are in the form of charges for healthcare services and sampled …


Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan Mar 2004

Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In van der Laan and Dudoit (2003) we propose and theoretically study a unified loss function based statistical methodology, which provides a road map for estimation and performance assessment. Given a parameter of interest which can be described as the minimizer of the population mean of a loss function, the road map involves as important ingredients cross-validation for estimator selection and minimizing over subsets of basis functions the empirical risk of the subset-specific estimator of the parameter of interest, where the basis functions correspond to a parameterization of a specified subspace of the complete parameter space. In this article we …


Unified Cross-Validation Methodology For Selection Among Estimators And A General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities And Examples, Mark J. Van Der Laan, Sandrine Dudoit Nov 2003

Unified Cross-Validation Methodology For Selection Among Estimators And A General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities And Examples, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

In Part I of this article we propose a general cross-validation criterian for selecting among a collection of estimators of a particular parameter of interest based on n i.i.d. observations. It is assumed that the parameter of interest minimizes the expectation (w.r.t. to the distribution of the observed data structure) of a particular loss function of a candidate parameter value and the observed data structure, possibly indexed by a nuisance parameter. The proposed cross-validation criterian is defined as the empirical mean over the validation sample of the loss function at the parameter estimate based on the training sample, averaged over …


Semi-Parametric Box-Cox Power Transformation Models For Censored Survival Observations, Tianxi Cai, Lu Tian, L. J. Wei Oct 2003

Semi-Parametric Box-Cox Power Transformation Models For Censored Survival Observations, Tianxi Cai, Lu Tian, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


Statistical Inferences Based On Non-Smooth Estimating Functions, Lu Tian, Jun S. Liu, Mary Zhao, L. J. Wei Oct 2003

Statistical Inferences Based On Non-Smooth Estimating Functions, Lu Tian, Jun S. Liu, Mary Zhao, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Population Pharmacokinetic Model With Time-Dependent Covariates Measured With Errors, Lang Lil, Xihong Lin, Mort B. Brown, Suneel Gupta, Kyung-Hoon Lee Oct 2003

A Population Pharmacokinetic Model With Time-Dependent Covariates Measured With Errors, Lang Lil, Xihong Lin, Mort B. Brown, Suneel Gupta, Kyung-Hoon Lee

The University of Michigan Department of Biostatistics Working Paper Series

We propose a population pharmacokinetic (PK) model with time-dependent covariates measured with errors. This model is used to model S-oxybutynin's kinetics following an oral administration of Ditropan, and allows the distribution rate to depend on time-dependent covariates blood pressure and heart rate, which are measured with errors. We propose two two-step estimation methods: the second order two-step method with numerical solutions of differential equations (2orderND), and the second order two-step method with closed form approximate solutions of differential equations (2orderAD). The proposed methods are computationally easy and require fitting a linear mixed model at the first step and a nonlinear …


An Extended General Location Model For Causal Inference From Data Subject To Noncompliance And Missing Values, Yahong Peng, Rod Little, Trivellore E. Raghuanthan Aug 2003

An Extended General Location Model For Causal Inference From Data Subject To Noncompliance And Missing Values, Yahong Peng, Rod Little, Trivellore E. Raghuanthan

The University of Michigan Department of Biostatistics Working Paper Series

Noncompliance is a common problem in experiments involving randomized assignment of treatments, and standard analyses based on intention-to treat or treatment received have limitations. An attractive alternative is to estimate the Complier-Average Causal Effect (CACE), which is the average treatment effect for the subpopulation of subjects who would comply under either treatment (Angrist, Imbens and Rubin, 1996, henceforth AIR). We propose an Extended General Location Model to estimate the CACE from data with non-compliance and missing data in the outcome and in baseline covariates. Models for both continuous and categorical outcomes and ignorable and latent ignorable (Frangakis and Rubin, 1999) …


Estimating The Accuracy Of Polymerase Chain Reaction-Based Tests Using Endpoint Dilution, Jim Hughes, Patricia Totten Mar 2003

Estimating The Accuracy Of Polymerase Chain Reaction-Based Tests Using Endpoint Dilution, Jim Hughes, Patricia Totten

UW Biostatistics Working Paper Series

PCR-based tests for various microorganisms or target DNA sequences are generally acknowledged to be highly "sensitive" yet the concept of sensitivity is ill-defined in the literature on these tests. We propose that sensitivity should be expressed as a function of the number of target DNA molecules in the sample (or specificity when the target number is 0). However, estimating this "sensitivity curve" is problematic since it is difficult to construct samples with a fixed number of targets. Nonetheless, using serially diluted replicate aliquots of a known concentration of the target DNA sequence, we show that it is possible to disentangle …


Checking Assumptions In Latent Class Regression Models Via A Markov Chain Monte Carlo Estimation Approach: An Application To Depression And Socio-Economic Status, Elizabeth Garrett, Richard Miech, Pamela Owens, William W. Eaton, Scott L. Zeger Jan 2003

Checking Assumptions In Latent Class Regression Models Via A Markov Chain Monte Carlo Estimation Approach: An Application To Depression And Socio-Economic Status, Elizabeth Garrett, Richard Miech, Pamela Owens, William W. Eaton, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Latent class regression models are useful tools for assessing associations between covariates and latent variables. However, evaluation of key model assumptions cannot be performed using methods from standard regression models due to the unobserved nature of latent outcome variables. This paper presents graphical diagnostic tools to evaluate whether or not latent class regression models adhere to standard assumptions of the model: conditional independence and non-differential measurement. An integral part of these methods is the use of a Markov Chain Monte Carlo estimation procedure. Unlike standard maximum likelihood implementations for latent class regression model estimation, the MCMC approach allows us to …


An Empirical Study Of Marginal Structural Models For Time-Independent Treatment, Tanya A. Henneman, Mark J. Van Der Laan Oct 2002

An Empirical Study Of Marginal Structural Models For Time-Independent Treatment, Tanya A. Henneman, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In non-randomized treatment studies a significant problem for statisticians is determining how best to adjust for confounders. Marginal structural models (MSMs) and inverse probability of treatment weighted (IPTW) estimators are useful in analyzing the causal effect of treatment in observational studies. Given an IPTW estimator a doubly robust augmented IPTW (AIPTW) estimator orthogonalizes it resulting in a more e±cient estimator than the IPTW estimator. One purpose of this paper is to make a practical comparison between the IPTW estimator and the doubly robust AIPTW estimator via a series of Monte- Carlo simulations. We also consider the selection of the optimal …