Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

2019

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 111

Full-Text Articles in Statistical Models

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen Dec 2019

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen

SMU Data Science Review

This paper presents a comparative study on machine learning methods as they are applied to product associations, future purchase predictions, and predictions of customer churn in aftermarket operations. Association rules are used help to identify patterns across products and find correlations in customer purchase behaviour. Studying customer behaviour as it pertains to Recency, Frequency, and Monetary Value (RFM) helps inform customer segmentation and identifies customers with propensity to churn. Lastly, Flowserve’s customer purchase history enables the establishment of churn thresholds for each customer group and assists in constructing a model to predict future churners. The aim of this model is …


Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas Dec 2019

Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas

SMU Data Science Review

In the age of hyper-connectivity, 24/7 news cycles, and instant news alerts via social media, mental health researchers don't have a way to automatically detect news content which is associated with triggering anxiety or depression in mental health patients. Using the Associated Press news wire, a semantic network was built with 1,056 news articles containing over 500,000 connections across multiple topics to provide a personalized algorithm which detects problematic news content for a given reader. We make use of Semantic Network Analysis to surface the relationship between news article text and anxiety in readers who struggle with mental health disorders. …


Ordinal Hyperplane Loss, Bob Vanderheyden Dec 2019

Ordinal Hyperplane Loss, Bob Vanderheyden

Doctor of Data Science and Analytics Dissertations

This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize …


Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova Dec 2019

Evaluation Of Modern Missing Data Handling Methods For Coefficient Alpha, Katerina Matysova

College of Education and Human Sciences: Dissertations, Theses, and Student Research

When assessing a certain characteristic or trait using a multiple item measure, quality of that measure can be assessed by examining the reliability. To avoid multiple time points, reliability can be represented by internal consistency, which is most commonly calculated using Cronbach’s coefficient alpha. Almost every time human participants are involved in research, there is missing data involved. Missing data means that even though complete data were expected to be collected, some data are missing. Missing data can follow different patterns as well as be the result of different mechanisms. One traditional way to deal with missing data is listwise …


Seasonal Time Series Models With Application To Weather And Lake Level Data, Mengqing Qin Dec 2019

Seasonal Time Series Models With Application To Weather And Lake Level Data, Mengqing Qin

MSU Graduate Theses

This work studies seasonal time series models with application to lake level and weather data. The thesis includes related time series concepts, integrated autoregressive moving average models (abbreviated as ARIMA), parameter estimation, model diagnostics, and forecasting. The studied time series models are applied to the data of daily lake level in Beaver Lake (1988-2017) and the data of daily maximum temperature in New York Central Park (1870-2017). Due to seasonality of the data, three different approaches are proposed to the modeling: regression method, functional ARIMA method and multiplicative seasonal ARIMA method. The forecasted values of the year 2018 are compared …


Conditional Survival Analysis For Concrete Bridge Decks, Azam Nabizadeh, Habib Tabatabai, Mohammad A. Tabatabai Nov 2019

Conditional Survival Analysis For Concrete Bridge Decks, Azam Nabizadeh, Habib Tabatabai, Mohammad A. Tabatabai

Civil and Environmental Engineering Faculty Articles

Bridge decks are a significant factor in the deterioration of bridges, and substantially affect long-term bridge maintenance decisions. In this study, conditional survival (reliability) analysis techniques are applied to bridge decks to evaluate the age at the end of service life using the National Bridge Inventory records. As bridge decks age, the probability of survival and the expected service life would change. The additional knowledge gained from the fact that a bridge deck has already survived a specific number of years alters (increases) the original probability of survival at subsequent years based on the conditional probability theory. The conditional expected …


Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller Nov 2019

Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller

LSU Doctoral Dissertations

Several of the northwestern Gulf of Mexico (GOM) shelf-edge banks provide critical hard bottom habitat for coral and fish communities, supporting a wide diversity of ecologically and economically important species. These sites may be fish aggregation and spawning sites and provide important habitat for fish growth and reproduction. Already designated as habitat areas of particular concern, many of these banks are also under consideration for inclusion in the expansion of the Flower Garden Banks National Marine Sanctuary. This project aimed to gain a more comprehensive understanding of the communities and fish species on shelf-edge banks by way of gonad histology, …


Inferring A Consensus Problem List Using Penalized Multistage Models For Ordered Data, Philip S. Boonstra, John C. Krauss Oct 2019

Inferring A Consensus Problem List Using Penalized Multistage Models For Ordered Data, Philip S. Boonstra, John C. Krauss

The University of Michigan Department of Biostatistics Working Paper Series

A patient's medical problem list describes his or her current health status and aids in the coordination and transfer of care between providers, among other things. Because a problem list is generated once and then subsequently modified or updated, what is not usually observable is the provider-effect. That is, to what extent does a patient's problem in the electronic medical record actually reflect a consensus communication of that patient's current health status? To that end, we report on and analyze a unique interview-based design in which multiple medical providers independently generate problem lists for each of three patient case abstracts …


An Agent-Based Modeling Approach For Predicting The Behavior Of Bighead Carp (Hypophthalmichthys Nobilis) Under The Influence Of Acoustic Deterrence, Joey Gaudy, Craig Garzella Oct 2019

An Agent-Based Modeling Approach For Predicting The Behavior Of Bighead Carp (Hypophthalmichthys Nobilis) Under The Influence Of Acoustic Deterrence, Joey Gaudy, Craig Garzella

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Integrating Mathematics And Biology In The Classroom: A Compendium Of Case Studies And Labs, Becky Sanft, Anne Walter Oct 2019

Integrating Mathematics And Biology In The Classroom: A Compendium Of Case Studies And Labs, Becky Sanft, Anne Walter

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley Oct 2019

Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari Oct 2019

Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari

Electronic Thesis and Dissertation Repository

In western Canada, there has been an increase in seismic activity linked to anthropogenic energy-related operations including conventional hydrocarbon production, wastewater fluid injection and more recently hydraulic fracturing (HF). Statistical modeling and characterization of the space, time and magnitude distributions of the seismicity clusters is vital for a better understanding of induced earthquake processes and development of predictive models. In this work, a statistical analysis of the seismicity in the Western Canada Sedimentary Basin was performed across past and present time periods by utilizing a compiled earthquake catalogue for Alberta and eastern British Columbia. Specifically, the frequency-magnitude statistics were analyzed …


Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson Sep 2019

Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson

Theses and Dissertations

This research centers on finding the statistical moments, network measures, and statistical tests that are most sensitive to various node degradations for the Barabási-Albert, Erdös-Rényi, and Watts-Strogratz network models. Thirty-five different graph structures were simulated for each of the random graph generation algorithms, and sensitivity analysis was undertaken on three different network measures: degree, betweenness, and closeness. In an effort to find the statistical moments that are the most sensitive to degradation within each network, four traditional moments: mean, variance, skewness, and kurtosis as well as three non-traditional moments: L-variance, L-skewness, and L-kurtosis were examined. Each of these moments were …


Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk Sep 2019

Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk

Dissertations, Theses, and Capstone Projects

This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision …


Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez Sep 2019

Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez

Theses and Dissertations

Hierarchical Linear Models (HLMs), also known as multi-level models, are an extension of multiple regression analysis and can aid in the understanding of human and machine workloads of a system. These models allow for prediction and testing in systems with hierarchies of two or more levels. The complex interrelated variability of these multi-level models exists in operational settings, such as the Air Force Distributed Common Ground System Full Motion Video (AF DCGS FMV) community which is composed of individuals (Level-1), groups (Level-2), units (Level-3), and organizations (Level-4). Through the development of sample size requirements and considerations for multi-level models, this …


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Identifying Undervalued Players In Fantasy Football, Christopher D. Morgan, Caroll Rodriguez, Korey Macvittie, Robert Slater, Daniel W. Engels Aug 2019

Identifying Undervalued Players In Fantasy Football, Christopher D. Morgan, Caroll Rodriguez, Korey Macvittie, Robert Slater, Daniel W. Engels

SMU Data Science Review

In this paper we present a model to predict player performance in fantasy football. In particular, identifying high-performance players can prove to be a difficult problem, as there are on occasion players capable of high performance whose past metrics give no indication of this capacity. These "sleepers"' are often undervalued, and the acquisition of such players can have notable impact on a fantasy football team's overall performance. We constructed a regression model that accounts for players' past performance and athletic metrics to predict their future performance. The model we built performs favorably in predicting athlete performance in relation to other …


Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels Aug 2019

Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels

SMU Data Science Review

In this paper we find a pattern of aperiodic seismic signals that precede earthquakes at any time in a laboratory earthquake’s cycle using a small window of time. We use a data set that comes from a classic laboratory experiment having several stick-slip displacements (earthquakes), a type of experiment which has been studied as a simulation of seismologic faults for decades. This data exhibits similar behavior to natural earthquakes, so the same approach may work in predicting the timing of them. Here we show that by applying random forest machine learning technique to the acoustic signal emitted by a laboratory …


Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku Aug 2019

Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku

Master of Science in Computer Science Theses

Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir Aug 2019

Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir

Theses and Dissertations

This thesis conducts Value at Risk (VaR) and Expected Shortfall (ES) estimation using GARCH modeling and Bayesian Model Averaging (BMA). BMA considers multiple models weighted by some information criterion. Through BMA, this thesis finds that VaR and ES estimates can be improved through enhanced modeling of the data generation process.


Statistical And Machine Learning Methods Evaluated For Incorporating Soil And Weather Into Corn Nitrogen Recommendations, Curtis J. Ransom, Newell R. Kitchen, James J. Camberato, Paul R. Carter, Richard B. Ferguson, Fabián G. Fernández, David W. Franzen, Carrie A. M. Laboski, D. Brenton Myers, Emerson D. Nafziger, John E. Sawyer, John F. Shanahan Aug 2019

Statistical And Machine Learning Methods Evaluated For Incorporating Soil And Weather Into Corn Nitrogen Recommendations, Curtis J. Ransom, Newell R. Kitchen, James J. Camberato, Paul R. Carter, Richard B. Ferguson, Fabián G. Fernández, David W. Franzen, Carrie A. M. Laboski, D. Brenton Myers, Emerson D. Nafziger, John E. Sawyer, John F. Shanahan

John E. Sawyer

Nitrogen (N) fertilizer recommendation tools could be improved for estimating corn (Zea mays L.) N needs by incorporating site-specific soil and weather information. However, an evaluation of analytical methods is needed to determine the success of incorporating this information. The objectives of this research were to evaluate statistical and machine learning (ML) algorithms for utilizing soil and weather information for improving corn N recommendation tools. Eight algorithms [stepwise, ridge regression, least absolute shrinkage and selection operator (Lasso), elastic net regression, principal component regression (PCR), partial least squares regression (PLSR), decision tree, and random forest] were evaluated using a dataset …


Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra Aug 2019

Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra

University of New Orleans Theses and Dissertations

Proteins are an important component of living organisms, composed of one or more polypeptide chains, each containing hundreds or even thousands of amino acids of 20 standard types. The structure of a protein from the sequence determines crucial functions of proteins such as initiating metabolic reactions, DNA replication, cell signaling, and transporting molecules. In the past, proteins were considered to always have a well-defined stable shape (structured proteins), however, it has recently been shown that there exist intrinsically disordered proteins (IDPs), which lack a fixed or ordered 3D structure, have dynamic characteristics and therefore, exist in multiple states. Based on …


Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu Aug 2019

Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu

Electronic Thesis and Dissertation Repository

Mark-recapture studies are often used to estimate the survival of individuals in a population and identify factors that affect survival in order to understand how the population might be affected by changing conditions. Factors that vary between individuals and over time, like body mass, present a challenge because they can only be observed when an individual is captured. Several models have been proposed to deal with the missing-covariate problem and commonly impose a logit link function which implies that the survival probability varies between 0 and 1. In this thesis I explore the estimability of four possible models when survival …


Split Credibility: A Two-Dimensional Semi-Linear Credibility Model, Jingbing Qiu Aug 2019

Split Credibility: A Two-Dimensional Semi-Linear Credibility Model, Jingbing Qiu

Electronic Thesis and Dissertation Repository

In the thesis, we introduce a two-dimensional semi-linear credibility model, which is an extension of the classical credibility or split credibility models used by practicing actuaries. Our model predicts the future expected losses of a policyholder by considering its historical primary and excess losses. The optimal split point is derived based on the mean squared error criterion. We show when and why splitting a policyholder’s historical losses into primary and excess parts work analytically. In addition, we derived formulas for estimating our model parameters nonparametrically. Finally, we show the application of our model through three examples.


Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin Aug 2019

Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin

All NMU Master's Theses

The primary purpose of the current study was to investigate the effect of court location (distance and angle from basket) and shot types used on shot success in NCAA Men’s DI basketball during the 2017-18 season. A secondary purpose was to further expand the analysis based on two additional factors: player position (guard, forward, or center) and team ranking. All statistical analyses were completed in RStudio and three binomial logistic regression analyses were performed to evaluate factors that influence shot success; one for all two and three point shot attempts, one for only two point attempts, and one for only …


Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris Aug 2019

Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris

Graduate Theses and Dissertations

Our goal is to create spatio-temporal models for predicting future gubernatorial elections. For a concrete example of how well our models work we use past data to predict the 2018 Arkansas gubernatorial election and use the existing 2018 election data to check our models predictive accuracy. Gubernatorial election data was collected from the Arkansas Secretary of State website while related covariate data was collected from the website for the Federal Reserve Bank of St. Louis. The data we collect is on the county level. For predictive purposes we fit multiple models to the data using Markov chain Monte Carlo and …


Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu Aug 2019

Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu

Electronic Theses and Dissertations

An efficient way of analyzing survival clinical data such as cancer data is a great concern to health experts. In this study, we investigate and propose an efficient way of handling survival clinical data. Simulation studies were conducted to compare performances of various forms of survival model techniques using an R package ``survsim". Models performance was conducted with varying sample sizes as small ($n5000$). For small and mild samples, the performance of the semi-parametric outperform or approximate the performance of the parametric model. However, for large samples, the parametric model outperforms the semi-parametric model. We compared the effectiveness and reliability …


Development Of A Statistical Shape-Function Model Of The Implanted Knee For Real-Time Prediction Of Joint Mechanics, Kalin Gibbons Aug 2019

Development Of A Statistical Shape-Function Model Of The Implanted Knee For Real-Time Prediction Of Joint Mechanics, Kalin Gibbons

Boise State University Theses and Dissertations

Outcomes of total knee arthroplasty (TKA) are dependent on surgical technique, patient variability, and implant design. Non-optimal design or alignment choices may result in undesirable contact mechanics and joint kinematics, including poor joint alignment, instability, and reduced range of motion. Implant design and surgical alignment are modifiable factors with potential to improve patient outcomes, and there is a need for robust implant designs that can accommodate patient variability. Our objective was to develop a statistical shape-function model (SFM) of a posterior stabilized implant knee to instantaneously predict output mechanics in an efficient manner. Finite element methods were combined with Latin …


Hierarchical Modeling And Differential Expression Analysis For Rna-Seq Experiments With Inbred And Hybrid Genotypes, Andrew Lithio, Dan Nettleton Jul 2019

Hierarchical Modeling And Differential Expression Analysis For Rna-Seq Experiments With Inbred And Hybrid Genotypes, Andrew Lithio, Dan Nettleton

Dan Nettleton

The performance of inbred and hybrid genotypes is of interest in plant breeding and genetics. High-throughput sequencing of RNA (RNA-seq) has proven to be a useful tool in the study of the molecular genetic responses of inbreds and hybrids to environmental stresses. Commonly used experimental designs and sequencing methods lead to complex data structures that require careful attention in data analysis. We demonstrate an analysis of RNA-seq data from a split-plot design involving drought stress applied to two inbred genotypes and two hybrids formed by crosses between the inbreds. Our generalized linear modeling strategy incorporates random effects for whole-plot experimental …