Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 28 of 28

Full-Text Articles in Statistics and Probability

Brief Review: Low Frequency Event Charts (G-Charts) In Healthcare, James Espinosa, David Ho, Alan Lucerna, Henry Schuitema May 2023

Brief Review: Low Frequency Event Charts (G-Charts) In Healthcare, James Espinosa, David Ho, Alan Lucerna, Henry Schuitema

Rowan-Virtua Research Day

The ability to determine if a change in a system is actually an improvement—or worsening in function—is one of the essential desiderata of quality improvement efforts. There are many ways to look at the issue. A special problem occurs when the event being studied is low frequency by nature. By way of example, patient falls in a given hospital or division of a hospital may occur in a way that is low frequency—yet each event is important. Process engineering has developed an approach to low frequency events. Part of this approach may involve specialized charts that look at the “time-between-events”—as …


Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee Jan 2021

Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee

Senior Independent Study Theses

According to the National Institutes of Mental Health (NIMH), depressive disorders (or major depression) are considered one of the most common and serious health risks in the United States. Our study focuses on extracting non-medical factors of depressive disorders diagnosis, such as overall health states, health risk behaviors, demography, and healthcare access, using the Behavioral Risk Factor Surveillance System (BRFSS) data set collected by the Centers for Disease Control and Prevention (CDC) in 2018.

We set the two objectives of our study about depressive disorders diagnosis in the United States as follows. First, we aim to utilize machine learning algorithms …


Statistical Models To Predict Popularity Of News Articles On Social Networks, Ziyi Liu May 2017

Statistical Models To Predict Popularity Of News Articles On Social Networks, Ziyi Liu

Arts & Sciences Electronic Theses and Dissertations

Social networks have changed the way that we obtain information. Content creators and, specifically news article authors, have in interest in predicting the popularity of content, in terms of the number of shares, likes, and comments across various social media platforms. In this thesis, I employ several statistical learning methods for prediction. Both regression-based and classification-based methods are compared according to their predictive ability, using a database from the UCI Machine Learning Repository.


Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson Jan 2016

Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson

Jeffrey S. Morris

High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose exible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We address the computational challenges involved in fitting these models by adopting an approximate inference framework. We develop an online variational Bayes algorithm that works by incrementally reading the data into memory one portion at a time. The performance of the method is assessed through simulation studies. …


Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris Jan 2016

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris

Jeffrey S. Morris

We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on …


Functional Regression, Jeffrey S. Morris Jan 2015

Functional Regression, Jeffrey S. Morris

Jeffrey S. Morris

Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and …


Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull Jan 2015

Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull

Jeffrey S. Morris

Current methods for conducting expression Quantitative Trait Loci (eQTL) analysis are limited in scope to a pairwise association testing between a single nucleotide polymorphism (SNPs) and expression probe set in a region around a gene of interest, thus ignoring the inherent between-SNP correlation. To determine association, p-values are then typically adjusted using Plug-in False Discovery Rate. As many SNPs are interrogated in the region and multiple probe-sets taken, the current approach requires the fitting of a large number of models. We propose to remedy this by introducing a flexible function-on-scalar regression that models the genome as a functional outcome. The …


Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe Mar 2014

Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe

Joseph M Hilbe

A basic overview of how to model and interpret a logistic regression model, as well as how to obtain the predicted probability or fit of the model and calculate its confidence intervals. R code used for all examples; some Stata is provided as a contrast.


Statistical Models For Predicting College Success, Yelen Nunez Nov 2013

Statistical Models For Predicting College Success, Yelen Nunez

FIU Electronic Theses and Dissertations

Colleges base their admission decisions on a number of factors to determine which applicants have the potential to succeed. This study utilized data for students that graduated from Florida International University between 2006 and 2012. Two models were developed (one using SAT as the principal explanatory variable and the other using ACT as the principal explanatory variable) to predict college success, measured using the student’s college grade point average at graduation. Some of the other factors that were used to make these predictions were high school performance, socioeconomic status, major, gender, and ethnicity. The model using ACT had a higher …


Beta Binomial Regression, Joseph M. Hilbe Oct 2013

Beta Binomial Regression, Joseph M. Hilbe

Joseph M Hilbe

Monograph on how to construct, interpret and evaluate beta, beta binomial, and zero inflated beta-binomial regression models. Stata and R code used for examples.


Cosine Directions Using Rao-Blackwell Theorem And Hausdorff Metric In Quasars, Byron E. Bell Aug 2013

Cosine Directions Using Rao-Blackwell Theorem And Hausdorff Metric In Quasars, Byron E. Bell

Byron E. Bell

This analysis will determine the equations of the Cosine Directions for all flux of the Optical Spectrum in quasars. Studies on Hausdorff metric will greatly enhance our understanding of quasars distances. The essential work of J. Bovy and D. Mortlock in the probabilities of quasars will set the methods/process of probability theory in the research along with Fokker-Planck probability theory. This study will complete steps in the classification of quasars by finding the minimum variance of flux by using the Rao–Blackwell Theorem. The papers of C. R. Rao and D. Blackwell will be examined to clarify more of the above …


A Study Of Mexican Free-Tailed Bat Chirp Syllables: Bayesian Functional Mixed Modeling Of Nonstationary Time Series Data With Time-Dependent Spectra, Josue G. Martinez, Kirsten M. Bohn, Raymond J. Carroll, Jeffrey S. Morris Feb 2013

A Study Of Mexican Free-Tailed Bat Chirp Syllables: Bayesian Functional Mixed Modeling Of Nonstationary Time Series Data With Time-Dependent Spectra, Josue G. Martinez, Kirsten M. Bohn, Raymond J. Carroll, Jeffrey S. Morris

Jeffrey S. Morris

We describe a new approach to analyze chirp syllables of free-tailed bats from two regions of Texas in which they are predominant: Austin and College Station. Our goal is to characterize any systematic regional differences in the mating chirps and assess whether individual bats have signature chirps. The data are analyzed by modeling spectrograms of the chirps as responses in a Bayesian functional mixed model. Given the variable chirp lengths, we compute the spectrograms on a relative time scale interpretable as the relative chirp position, using a variable window overlap based on chirp length. We use 2D wavelet transforms to …


Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway Mar 2012

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Several authors have studied the performance of optimal, squared error loss (SEL) estimated ranks. Though these are effective, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers. We construct loss functions that address this goal and evaluate candidate rank estimates, some of which optimize specific loss functions. We study performance for a fully parametric hierarchical model with a Gaussian prior and Gaussian sampling distributions, evaluating performance for several loss functions. Results show that though SEL-optimal ranks and percentiles do not specifically focus on classifying with respect to a percentile cut …


Ranking Usrds Provider-Specific Smrs From 1998-2001, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway Mar 2012

Ranking Usrds Provider-Specific Smrs From 1998-2001, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Provider profiling (ranking, "league tables") is prevalent in health services research. Similarly, comparing educational institutions and identifying differentially expressed genes depend on ranking. Effective ranking procedures must be structured by a hierarchical (Bayesian) model and guided by a ranking-specific loss function, however even optimal methods can perform poorly and estimates must be accompanied by uncertainty assessments. We use the 1998-2001 Standardized Mortality Ratio (SMR) data from United States Renal Data System (USRDS) as a platform to identify issues and approaches. Our analyses extend Liu et al. (2004) by combining evidence over multiple years via an AR(1) model; by considering estimates …


Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris Jan 2012

Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris

Jeffrey S. Morris

In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational …


Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do Jan 2012

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms …


Testing For Improvement In Prediction Model Performance, Margaret S. Pepe Phd, Kathleen F. Kerr Phd, Gary Longton, Zheyu Wang Phd Nov 2011

Testing For Improvement In Prediction Model Performance, Margaret S. Pepe Phd, Kathleen F. Kerr Phd, Gary Longton, Zheyu Wang Phd

Margaret S Pepe PhD

New methodology has been proposed in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that the coefficient for Y is zero in the risk model, P(D=1|X,Y). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We investigate properties of tests through simulation studies, focusing on the change …


Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer Dec 2010

Clustering With Exclusion Zones: Genomic Applications, Mark Segal, Yuanyuan Xiao, Fred Huffer

Mark R Segal

Methods for formally evaluating the clustering of events in space or time, notably the scan statistic, have been richly developed and widely applied. In order to utilize the scan statistic and related approaches, it is necessary to know the extent of the spatial or temporal domains wherein the events arise. Implicit in their usage is that these domains have no “holes”—hereafter “exclusion zones”—regions in which events a priori cannot occur. However, in many contexts, this requirement is not met. When the exclusion zones are known, it is straightforward to correct the scan statistic for their occurrence by simply adjusting the …


Semiparametric Analysis Of Recurrent Events: Artificial Censoring, Truncation, Pairwise Estimation And Inference, Debashis Ghosh Dec 2009

Semiparametric Analysis Of Recurrent Events: Artificial Censoring, Truncation, Pairwise Estimation And Inference, Debashis Ghosh

Debashis Ghosh

The analysis of recurrent failure time data from longitudinal studies can be complicated by the presence of dependent censoring. There has been a substantive literature that has developed based on an artificial censoring device. We explore in this article the connection between this class of methods with truncated data structures. In addition, a new procedure is developed for estimation and inference in a joint model for recurrent events and dependent censoring. Estimation proceeds using a mixed U-statistic based estimating function approach. New resampling-based methods for variance estimation and model checking are also described. The methods are illustrated by application to …


A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles Nov 2009

A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles

Sunduz Keles

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data.

We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and …


Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal Dec 2008

Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal

Mark R Segal

The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression …


An Improved Model Averaging Scheme For Logistic Regression, Debashis Ghosh, Zheng Yuan Jan 2008

An Improved Model Averaging Scheme For Logistic Regression, Debashis Ghosh, Zheng Yuan

Debashis Ghosh

Recently, penalized regression methods have attracted much attention in the statistical literature. In this article, we argue that such methods can be improved for the purposes of prediction by utilizing model averaging ideas. We propose a new algorithm that combines penalized regression with model averaging for improved prediction. We also discuss the issue of model selection versus model averaging and propose a diagnostic based on the notion of generalized degrees of freedom. The proposed methods are studied using both simulated and real data.


Chess, Chance And Conspiracy, Mark Segal Dec 2006

Chess, Chance And Conspiracy, Mark Segal

Mark R Segal

Chess and chance are seemingly strange bedfellows. Luck and/or randomness have no apparent role in move selection when the game is played at the highest levels. However, when competition is at the ultimate level, that of the World Chess Championship (WCC), chess and conspiracy are not strange bedfellows, there being a long and colorful history of accusations levied between participants. One such accusation, frequently repeated, was that all the games in the 1985 WCC (Karpov vs Kasparov) were fixed and prearranged move by move. That this claim was advanced by a former World Champion, Bobby Fischer, argues that it ought …


A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao Dec 2006

A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao

Yichuan Zhao

Mean residual life function, or life expectancy, is an important function to characterize distribution of residual life. The proportional mean residual life model by Oakes and Dasu (1990) is a regression tool to study the association between life expectancy and its associated covariates. Although semiparametric inference procedures have been proposed in the literature, the accuracy of such procedures may be low when the censoring proportion is relatively large. In this paper, the semiparametric inference procedures are studied with an empirical likelihood ratio method. An empirical likelihood confidence region is constructed for the regression parameters. The proposed method is further compared …


A Mathematical Regression Of The U.S. Gross Private Domestic Investment 1959-2001, Byron E. Bell Sep 2006

A Mathematical Regression Of The U.S. Gross Private Domestic Investment 1959-2001, Byron E. Bell

Byron E. Bell

SUMMARY OF PROJECT What did I do? A study of the role the U.S. stock markets and money markets have possibly played in the Gross Private Domestic Investment (GPDI) of the United States from the year 1959 to the year 2001 and I created a Multiple Linear Regression Model (MLRM).


Derivation Of A Scaled Binomial As An Instance Of A General Discrete Exponential Distribution, Joseph Hilbe Jan 1994

Derivation Of A Scaled Binomial As An Instance Of A General Discrete Exponential Distribution, Joseph Hilbe

Joseph M Hilbe

No abstract provided.


Generalized Linear Models: Software Implementation And The Structure Of A General Power-Link Based Glm Algorithm, Joseph Hilbe Apr 1993

Generalized Linear Models: Software Implementation And The Structure Of A General Power-Link Based Glm Algorithm, Joseph Hilbe

Joseph M Hilbe

Generalized linear modeling (GLM) is currently undergoing a renaissance. The number of software packages offering GLM capability grows each year and as a partial consequence one finds an increased number of research endeavors being modeled using GLM methodology. On the other hand, there have likewise been an increasing number of requests to vendors by users of statistical packages to include GLM facilities amid other offerings. The overall effect has been a near 300 percent increase in GLM programs over the past four years.

I shall discuss the nature of generalized linear models followed by an examination of how they have …


Log-Negative Binomial Regression As A Generalized Linear Model, Joseph Hilbe Dec 1992

Log-Negative Binomial Regression As A Generalized Linear Model, Joseph Hilbe

Joseph M Hilbe

The negative binomial (NB) is a member of the exponential family of discrete probability distributions. The nature of the distribution is itself well understood, but its contribution to regression modeling, in particular as a generalized linear model (GLM), has not been appreciated. The mathematical properties of the negative binomial are derived and GLM algorithms are developed for both the canonical and log form. Geometric regression is seen as an instance of the NB. The log forms of both may be effectively used to model types of POisson-overdispersed count data. A GLM-type algorithm is created for a general log-negative binomial regression …