Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

2016

Discipline
Institution
Keyword
Publication
Publication Type
File Type

Articles 31 - 52 of 52

Full-Text Articles in Statistical Models

Analysis And Modeling Of U.S. Army Recruiting Markets, Joshua L. Mcdonald Mar 2016

Analysis And Modeling Of U.S. Army Recruiting Markets, Joshua L. Mcdonald

Theses and Dissertations

The United States Army Recruiting Command (USAREC) is charged with finding, engaging, and ultimately enlisting young Americans for service as Soldiers in the U.S. Army. USAREC must decide how to allocate monthly enlistment goals, by aptitude and education level, across its 38 subordinate recruiting battalions in order to maximize the number of enlistment contracts produced each year. In our research, we model the production of enlistment contracts as a function of recruiting supply and demand factors which vary over the recruiting battalion areas of responsibility. Using county-level data for the period of recruiting year RY2010 through RY2013 mapped to recruiting …


Predicting Financial Distress: A Comparison Of Survival Analysis And Decision Tree Techniques, Adrian Gepp, Kuldeep Kumar Feb 2016

Predicting Financial Distress: A Comparison Of Survival Analysis And Decision Tree Techniques, Adrian Gepp, Kuldeep Kumar

Adrian Gepp

Financial distress and then the consequent failure of a business is usually an extremely costly and disruptive event. Statistical financial distress prediction models attempt to predict whether a business will experience financial distress in the future. Discriminant analysis and logistic regression have been the most popular approaches, but there is also a large number of alternative cutting - edge data mining techniques that can be used. In this paper, a semi-parametric Cox survival analysis model and non-parametric CART decision trees have been applied to financial distress prediction and compared with each other as well as the most popular approaches. This …


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Binomial Regression With A Misclassified Covariate And Outcome., Sheng Luo, Wenyaw Chan, Michelle A Detry, Paul J Massman, R S. Doody Feb 2016

Binomial Regression With A Misclassified Covariate And Outcome., Sheng Luo, Wenyaw Chan, Michelle A Detry, Paul J Massman, R S. Doody

Faculty Publications

Misclassification occurring in either outcome variables or categorical covariates or both is a common issue in medical science. It leads to biased results and distorted disease-exposure relationships. Moreover, it is often of clinical interest to obtain the estimates of sensitivity and specificity of some diagnostic methods even when neither gold standard nor prior knowledge about the parameters exists. We present a novel Bayesian approach in binomial regression when both the outcome variable and one binary covariate are subject to misclassification. Extensive simulation results under various scenarios and a real clinical example are given to illustrate the proposed approach. This approach …


Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret Jan 2016

Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret

UW Biostatistics Working Paper Series

We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …


Spatiotemporal Meta-Analysis: Reviewing Health Psychology Phenomena Over Space And Time., Blair T. Johnson Jan 2016

Spatiotemporal Meta-Analysis: Reviewing Health Psychology Phenomena Over Space And Time., Blair T. Johnson

CHIP Documents

This supplemental material is meant to support this article:

Johnson, B. T., Crowley, E., & Marrouch, N. Spatiotemporal meta-analysis: Reviewing health psychology phenomena over space and time. Health Psychology Review.

Specifically, it is a database of GDPs per capita for nations in the world between 1800 and 2015. It is archived here to support an online supplement to this article.

GDP per capita


Design & Analysis Of A Computer Experiment For An Aerospace Conformance Simulation Study, Ryan W. Gryder Jan 2016

Design & Analysis Of A Computer Experiment For An Aerospace Conformance Simulation Study, Ryan W. Gryder

Theses and Dissertations

Within NASA's Air Traffic Management Technology Demonstration # 1 (ATD-1), Interval Management (IM) is a flight deck tool that enables pilots to achieve or maintain a precise in-trail spacing behind a target aircraft. Previous research has shown that violations of aircraft spacing requirements can occur between an IM aircraft and its surrounding non-IM aircraft when it is following a target on a separate route. This research focused on the experimental design and analysis of a deterministic computer simulation which models our airspace configuration of interest. Using an original space-filling design and Gaussian process modeling, we found that aircraft delay assignments …


Topics In Logistic Regression Analysis, Zhiheng Xie Jan 2016

Topics In Logistic Regression Analysis, Zhiheng Xie

Theses and Dissertations--Statistics

Discrete-time Markov chains have been used to analyze the transition of subjects from intact cognition to dementia with mild cognitive impairment and global impairment as intervening transient states, and death as competing risk. A multinomial logistic regression model is used to estimate the probability distribution in each row of the one-step transition matrix that correspond to the transient states. We investigate some goodness of fit tests for a multinomial distribution with covariates to assess the fit of this model to the data. We propose a modified chi-square test statistic and a score test statistic for the multinomial assumption in each …


Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson Jan 2016

Online Variational Bayes Inference For High-Dimensional Correlated Data, Sylvie T. Kabisa, Jeffrey S. Morris, David Dunson

Jeffrey S. Morris

High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose exible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We address the computational challenges involved in fitting these models by adopting an approximate inference framework. We develop an online variational Bayes algorithm that works by incrementally reading the data into memory one portion at a time. The performance of the method is assessed through simulation studies. …


Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris Jan 2016

Functional Car Models For Spatially Correlated Functional Datasets, Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A. Baggerly, Tadeusz Majewski, Bogdan Czerniak, Jeffrey S. Morris

Jeffrey S. Morris

We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on …


Multi-State Models With Missing Covariates, Wenjie Lou Jan 2016

Multi-State Models With Missing Covariates, Wenjie Lou

Theses and Dissertations--Statistics

Multi-state models have been widely used to analyze longitudinal event history data obtained in medical studies. The tools and methods developed recently in this area require the complete observed datasets. While, in many applications measurements on certain components of the covariate vector are missing on some study subjects. In this dissertation, several likelihood-based methodologies were proposed to deal with datasets with different types of missing covariates efficiently when applying multi-state models.

Firstly, a maximum observed data likelihood method was proposed when the data has a univariate missing pattern and the missing covariate is a categorical variable. The construction of the …


Developing An Alternative Way To Analyze Nanostring Data, Shu Shen Jan 2016

Developing An Alternative Way To Analyze Nanostring Data, Shu Shen

Theses and Dissertations--Statistics

Nanostring technology provides a new method to measure gene expressions. It's more sensitive than microarrays and able to do more gene measurements than RT-PCR with similar sensitivity. This system produces counts for each target gene and tabulates them. Counts can be normalized by using an Excel macro or nSolver before analysis. Both methods rely on data normalization prior to statistical analysis to identify differentially expressed genes. Alternatively, we propose to model gene expressions as a function of positive controls and reference gene measurements. Simulations and examples are used to compare this model with Nanostring normalization methods. The results show that …


Resolving Gnetum Evolutionary History, Angela Mcfadden Jan 2016

Resolving Gnetum Evolutionary History, Angela Mcfadden

All Master's Theses

Gnetum are non-flowering seed plants of the tropics, indigenous to South America, Africa, and Asia. This group of about 40 species is fascinating to botanists because it shares distinctive morphological characteristics with flowering plants, such as broad leaves, woody stems, and flower-like strobili. There are still questions surrounding the relationships within the genus of Gnetum. With that in mind, I focused my work on generating phylogenetic hypotheses, using two molecular data sets: a concatenation of over 60 different chloroplast genes (66,815 base pairs), and the whole chloroplast genome (128,772 base pairs). This allowed me to compare the two phylogenies …


Space-Time Modelling Of Emerging Infectious Diseases: Assessing Leptospirosis Risk In Sri Lanka, Cameron C F Plouffe Jan 2016

Space-Time Modelling Of Emerging Infectious Diseases: Assessing Leptospirosis Risk In Sri Lanka, Cameron C F Plouffe

Theses and Dissertations (Comprehensive)

In this research, models were developed to analyze leptospirosis incidence in Sri Lanka and its relation to rainfall. Before any leptospirosis risk models were developed, rainfall data were evaluated from an agro-ecological monitoring network for producing maps of total monthly rainfall in Sri Lanka. Four spatial interpolation techniques were compared: inverse distance weighting, thin-plate splines, ordinary kriging, and Bayesian kriging. Error metrics were used to validate interpolations against independent data. Satellite data were used to assess the spatial pattern of rainfall. Results indicated that Bayesian kriging and splines performed best in low and high rainfall, respectively. Rainfall maps generated from …


Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee Jan 2016

Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee

Theses and Dissertations

High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. …


Black Cloud Randomization Test, Nicholas S. Vanni Jan 2016

Black Cloud Randomization Test, Nicholas S. Vanni

Williams Honors College, Honors Research Projects

The Black Cloud Randomization Test looks at a nontraditional question and attempts to answer the question using unique statistics. The purpose of this paper is to apply what has been learned throughout the years and apply this knowledge to a final project. Data for this project follows an emergency room’s on call schedule, as well as the number of traumas that came in during each day shift. The project builds on what has been already learned and helps to open a different way of working with statistics. The project was coded in the R software. With different restrictions, there are …


Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi Jan 2016

Development In Normal Mixture And Mixture Of Experts Modeling, Meng Qi

Theses and Dissertations--Statistics

In this dissertation, first we consider the problem of testing homogeneity and order in a contaminated normal model, when the data is correlated under some known covariance structure. To address this problem, we developed a moment based homogeneity and order test, and design weights for test statistics to increase power for homogeneity test. We applied our test to microarray about Down’s syndrome. This dissertation also studies a singular Bayesian information criterion (sBIC) for a bivariate hierarchical mixture model with varying weights, and develops a new data dependent information criterion (sFLIC).We apply our model and criteria to birth- weight and gestational …


Continuous Time Multi-State Models For Interval Censored Data, Lijie Wan Jan 2016

Continuous Time Multi-State Models For Interval Censored Data, Lijie Wan

Theses and Dissertations--Statistics

Continuous-time multi-state models are widely used in modeling longitudinal data of disease processes with multiple transient states, yet the analysis is complex when subjects are observed periodically, resulting in interval censored data. Recently, most studies focused on modeling the true disease progression as a discrete time stationary Markov chain, and only a few studies have been carried out regarding non-homogenous multi-state models in the presence of interval-censored data. In this dissertation, several likelihood-based methodologies were proposed to deal with interval censored data in multi-state models.

Firstly, a continuous time version of a homogenous Markov multi-state model with backward transitions was …


Statistical Inference On Dynamical Systems, Hongyuan Wang Jan 2016

Statistical Inference On Dynamical Systems, Hongyuan Wang

Theses and Dissertations--Statistics

The ordinary differential equation (ODE) is one representative and popular tool in modeling dynamical systems, which are widely implemented in physics, biology, economics, chemistry and biomedical sciences, etc. Because of the importance of dynamical systems in scientific studies, they are the main focuses of my dissertation.

The first chapter of the dissertation is introduction and literature review, which mainly focuses on numerical integration algorithms of ODEs that are difficult to solve analytically, as well as derivative-free optimization algorithms for the so-called inverse problem.

The second chapter is on the estimation method based on numerical solvers of differential equations. We start …


Statistical Methods For Environmental Exposure Data Subject To Detection Limits, Yuchen Yang Jan 2016

Statistical Methods For Environmental Exposure Data Subject To Detection Limits, Yuchen Yang

Theses and Dissertations--Statistics

In this dissertation, we develop unified and efficient nonparametric statistical methods for estimating and comparing environmental exposure distributions in presence of detection limits. In the first part, we propose a kernel-smoothed nonparametric estimator for the exposure distribution without imposing any independence assumption between the exposure level and detection limit. We show that the proposed estimator is consistent and asymptotically normal. Simulation studies demonstrate that the proposed estimator performs well in practical situations. A colon cancer study is provided for illustration. In the second part, we develop a class of test statistics to compare exposure distributions between two groups by using …


Improved Models For Differential Analysis For Genomic Data, Hong Wang Jan 2016

Improved Models For Differential Analysis For Genomic Data, Hong Wang

Theses and Dissertations--Statistics

This paper intend to develop novel statistical methods to improve genomic data analysis, especially for differential analysis. We considered two different data type: NanoString nCounter data and somatic mutation data. For NanoString nCounter data, we develop a novel differential expression detection method. The method considers a generalized linear model of the negative binomial family to characterize count data and allows for multi-factor design. Data normalization is incorporated in the model framework through data normalization parameters, which are estimated from control genes embedded in the nCounter system. For somatic mutation data, we develop beta-binomial model-based approaches to identify highly or lowly …


Collective Action And Decision Making: An Analysis Of Economic Modeling And Environmental Free-Riding, Thomas Miller Jan 2016

Collective Action And Decision Making: An Analysis Of Economic Modeling And Environmental Free-Riding, Thomas Miller

Honor Scholar Theses

No abstract provided.