Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

SelectedWorks

Discipline
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 75

Full-Text Articles in Statistical Models

Addition To Pglr Chap 6, Joseph M. Hilbe Aug 2016

Addition To Pglr Chap 6, Joseph M. Hilbe

Joseph M Hilbe

Addition to Chapter 6 in Practical Guide to Logistic Regression. Added section on Bayesian logistic regression using Stata.


Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson Jan 2016

Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson

Jeffrey S. Morris

High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose exible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We address the computational challenges involved in fitting these models by adopting an approximate inference framework. We develop an online variational Bayes algorithm that works by incrementally reading the data into memory one portion at a time. The performance of the method is assessed through simulation studies. …


Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris Jan 2016

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris

Jeffrey S. Morris

We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on …


Negative Binomial Regerssion, 2nd Ed, 2nd Print, Errata And Comments, Joseph Hilbe Jan 2015

Negative Binomial Regerssion, 2nd Ed, 2nd Print, Errata And Comments, Joseph Hilbe

Joseph M Hilbe

Errata and Comments for 2nd printing of NBR2, 2nd edition. Previous errata from first printing all corrected. Some added and new text as well.


Bayesian Function-On-Function Regression For Multi-Level Functional Data, Mark J. Meyer, Brent A. Coull, Francesco Versace, Paul Cinciripini, Jeffrey S. Morris Jan 2015

Bayesian Function-On-Function Regression For Multi-Level Functional Data, Mark J. Meyer, Brent A. Coull, Francesco Versace, Paul Cinciripini, Jeffrey S. Morris

Jeffrey S. Morris

Medical and public health research increasingly involves the collection of more and more complex and high dimensional data. In particular, functional data|where the unit of observation is a curve or set of curves that are finely sampled over a grid -- is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data, presenting a simple model as well as a more extensive mixed model framework, along with multiple functional posterior …


Functional Regression, Jeffrey S. Morris Jan 2015

Functional Regression, Jeffrey S. Morris

Jeffrey S. Morris

Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and …


Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull Jan 2015

Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull

Jeffrey S. Morris

Current methods for conducting expression Quantitative Trait Loci (eQTL) analysis are limited in scope to a pairwise association testing between a single nucleotide polymorphism (SNPs) and expression probe set in a region around a gene of interest, thus ignoring the inherent between-SNP correlation. To determine association, p-values are then typically adjusted using Plug-in False Discovery Rate. As many SNPs are interrogated in the region and multiple probe-sets taken, the current approach requires the fitting of a large number of models. We propose to remedy this by introducing a flexible function-on-scalar regression that models the genome as a functional outcome. The …


Asimmetria Del Rischio Sistematico Dei Titolo Immobiliari Americani: Nuove Evidenze Econometriche, Paola De Santis, Carlo Drago Jul 2014

Asimmetria Del Rischio Sistematico Dei Titolo Immobiliari Americani: Nuove Evidenze Econometriche, Paola De Santis, Carlo Drago

Carlo Drago

In questo lavoro riscontriamo un aumento del rischio sistematico dei titoli del mercato immobiliare americano nell’anno 2007 seguito da un ritorno ai valori iniziali nell’anno 2009 e si evidenzia la possibile presenza di break strutturali. Per valutare il suddetto rischio sistematico è stato scelto il modello a tre fattori di Fama e French ed è stata studiata la relazione tra l’extra rendimento dell’indice REIT, utilizzato come proxy dell’andamento dei titoli immobiliari americani, e l’extra rendimento dell’indice S&P500 rappresentativo del rendimento del portafoglio di mercato. I risultati confermano la presenza di un “Asymmetric REIT Beta Puzzle” coerentemente con alcuni precedenti studi …


Errata - Logistic Regression Models, Joseph Hilbe May 2014

Errata - Logistic Regression Models, Joseph Hilbe

Joseph M Hilbe

Errata for Logistic Regression Models, 4th Printing


Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe Mar 2014

Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe

Joseph M Hilbe

A basic overview of how to model and interpret a logistic regression model, as well as how to obtain the predicted probability or fit of the model and calculate its confidence intervals. R code used for all examples; some Stata is provided as a contrast.


An Asymptotically Minimax Kernel Machine, Debashis Ghosh Jan 2014

An Asymptotically Minimax Kernel Machine, Debashis Ghosh

Debashis Ghosh

Recently, a class of machine learning-inspired procedures, termed kernel machine methods, has been extensively developed in the statistical literature. It has been shown to have large power for a wide class of problems and applications in genomics and brain imaging. Many authors have exploited an equivalence between kernel machines and mixed eects models and used attendant estimation and inferential procedures. In this note, we construct a so-called `adaptively minimax' kernel machine. Such a construction highlights the role of thresholding in the observation space and limits on the interpretability of such kernel machines.


On Likelihood Ratio Tests When Nuisance Parameters Are Present Only Under The Alternative, Cz Di, K-Y Liang Jan 2014

On Likelihood Ratio Tests When Nuisance Parameters Are Present Only Under The Alternative, Cz Di, K-Y Liang

Chongzhi Di

In parametric models, when one or more parameters disappear under the null hypothesis, the likelihood ratio test statistic does not converge to chi-square distributions. Rather, its limiting distribution is shown to be equivalent to that of the supremum of a squared Gaussian process. However, the limiting distribution is analytically intractable for most of examples, and approximation or simulation based methods must be used to calculate the p values. In this article, we investigate conditions under which the asymptotic distributions have analytically tractable forms, based on the principal component decomposition of Gaussian processes. When these conditions are not satisfied, the principal …


Beta Binomial Regression, Joseph M. Hilbe Oct 2013

Beta Binomial Regression, Joseph M. Hilbe

Joseph M Hilbe

Monograph on how to construct, interpret and evaluate beta, beta binomial, and zero inflated beta-binomial regression models. Stata and R code used for examples.


諸外国における最新のデータエディティング事情~混淆正規分布モデルによる多変量外れ値検出法の検証~(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi Aug 2013

諸外国における最新のデータエディティング事情~混淆正規分布モデルによる多変量外れ値検出法の検証~(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi

Masayoshi Takahashi

No abstract provided.


Caimans - Semantic Platform For Advance Content Mining (Sketch Wp), Salvo Reina Jul 2013

Caimans - Semantic Platform For Advance Content Mining (Sketch Wp), Salvo Reina

Salvo Reina

A middleware SW platform was created for automatic classification of textual contents. The worksheet of requirements and the original flow-sketchs are published.


Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi Jan 2013

Global Quantitative Assessment Of The Colorectal Polyp Burden In Familial Adenomatous Polyposis Using A Web-Based Tool, Patrick M. Lynch, Jeffrey S. Morris, William A. Ross, Miguel A. Rodriguez-Bigas, Juan Posadas, Rossa Khalaf, Diane M. Weber, Valerie O. Sepeda, Bernard Levin, Imad Shureiqi

Jeffrey S. Morris

Background: Accurate measures of the total polyp burden in familial adenomatous polyposis (FAP) are lacking. Current assessment tools include polyp quantitation in limited-field photographs and qualitative total colorectal polyp burden by video.

Objective: To develop global quantitative tools of the FAP colorectal adenoma burden.

Design: A single-arm, phase II trial.

Patients: Twenty-seven patients with FAP.

Intervention: Treatment with celecoxib for 6 months, with before-treatment and after-treatment videos posted to an intranet with an interactive site for scoring.

Main Outcome Measurements: Global adenoma counts and sizes (grouped into categories: less than 2 mm, 2-4 mm, and greater than 4 mm) were …


Bayesian Nonparametric Regression And Density Estimation Using Integrated Nested Laplace Approximations, Xiaofeng Wang Jan 2013

Bayesian Nonparametric Regression And Density Estimation Using Integrated Nested Laplace Approximations, Xiaofeng Wang

Xiaofeng Wang

Integrated nested Laplace approximations (INLA) are a recently proposed approximate Bayesian approach to fit structured additive regression models with latent Gaussian field. INLA method, as an alternative to Markov chain Monte Carlo techniques, provides accurate approximations to estimate posterior marginals and avoid time-consuming sampling. We show here that two classical nonparametric smoothing problems, nonparametric regression and density estimation, can be achieved using INLA. Simulated examples and \texttt{R} functions are demonstrated to illustrate the use of the methods. Some discussions on potential applications of INLA are made in the paper.


Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin Jan 2013

Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin

Peter Austin

OBJECTIVE: Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine-learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines.

STUDY DESIGN AND SETTING: We compared the performance of these classification methods with that of conventional classification trees to classify patients with heart failure (HF) …


Predictive Accuracy Of Risk Factors And Markers: A Simulation Study Of The Effect Of Novel Markers On Different Performance Measures For Logistic Regression Models, Peter C. Austin Jan 2013

Predictive Accuracy Of Risk Factors And Markers: A Simulation Study Of The Effect Of Novel Markers On Different Performance Measures For Logistic Regression Models, Peter C. Austin

Peter Austin

The change in c-statistic is frequently used to summarize the change in predictive accuracy when a novel risk factor is added to an existing logistic regression model. We explored the relationship between the absolute change in the c-statistic, Brier score, generalized R(2) , and the discrimination slope when a risk factor was added to an existing model in an extensive set of Monte Carlo simulations. The increase in model accuracy due to the inclusion of a novel marker was proportional to both the prevalence of the marker and to the odds ratio relating the marker to the outcome but inversely …


Nbr2 Errata And Comments, Joseph Hilbe Dec 2012

Nbr2 Errata And Comments, Joseph Hilbe

Joseph M Hilbe

Errata and Comments for Negative Binomial Regression, 2nd edition


International Astrostatistics Association, Joseph Hilbe Sep 2012

International Astrostatistics Association, Joseph Hilbe

Joseph M Hilbe

Overview of the history, purpose, Council and officers of the International Astrostatistics Association (IAA)


諸外国のデータエディティング及び混淆正規分布モデルによる多変量外れ値検出法についての研究(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi Aug 2012

諸外国のデータエディティング及び混淆正規分布モデルによる多変量外れ値検出法についての研究(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi

Masayoshi Takahashi

No abstract provided.


Glme3_Ado_Do_Files, Joseph Hilbe May 2012

Glme3_Ado_Do_Files, Joseph Hilbe

Joseph M Hilbe

GLME3 ado and do files (116 in total)


Glme3 Data And Adodo Files, Joseph Hilbe May 2012

Glme3 Data And Adodo Files, Joseph Hilbe

Joseph M Hilbe

A listing of Data Sets and Stata software commands and do files in GLME3 book


Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris Jan 2012

Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris

Jeffrey S. Morris

In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational …


Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do Jan 2012

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms …


Proportional Mean Residual Life Model For Right-Censored Length-Biased Data, Gary Kwun Chuen Chan, Ying Qing Chen, Chongzhi Di Jan 2012

Proportional Mean Residual Life Model For Right-Censored Length-Biased Data, Gary Kwun Chuen Chan, Ying Qing Chen, Chongzhi Di

Chongzhi Di

To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length-biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to the so-called “induced informative censoring” in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes and Dasu (1990) for analysis of censored length-biased survival data. Several nonstandard data structures, …


Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin Jan 2012

Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin

Peter Austin

Purpose: Observational studies using electronic administrative health care databases are often used to estimate the effects of treatments and exposures. Traditionally, a cohort design has been used to estimate these effects, but increasingly studies are using a nested case-control (NCC) design. The relative statistical efficiency of these two designs has not been examined in detail.

Methods: We used Monte Carlo simulations to compare these two designs in terms of the bias and precision of effect estimates. We examined three different settings: (A): treatment occurred at baseline and there was a single outcome of interest; (B): treatment was time-varying and there …


Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin Jan 2012

Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin

Peter Austin

Researchers are increasingly using observational or nonrandomized data to estimate causal treatment effects. Essential to the production of high-quality evidence is the ability to reduce or minimize the confounding that frequently occurs in observational studies. When using the potential outcome framework to define causal treatment effects, one requires the potential outcome under each possible treatment. However, only the outcome under the actual treatment received is observed, whereas the potential outcomes under the other treatments are considered missing data. Some authors have proposed that parametric regression models be used to estimate potential outcomes. In this study, we examined the use of …


Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin Jan 2012

Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin

Peter Austin

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1991-2001 and …