Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

2019

Discipline
Institution
Keyword
Publication

Articles 1 - 30 of 56

Full-Text Articles in Statistical Models

Ordinal Hyperplane Loss, Bob Vanderheyden Dec 2019

Ordinal Hyperplane Loss, Bob Vanderheyden

Doctor of Data Science and Analytics Dissertations

This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize …


Seasonal Time Series Models With Application To Weather And Lake Level Data, Mengqing Qin Dec 2019

Seasonal Time Series Models With Application To Weather And Lake Level Data, Mengqing Qin

MSU Graduate Theses

This work studies seasonal time series models with application to lake level and weather data. The thesis includes related time series concepts, integrated autoregressive moving average models (abbreviated as ARIMA), parameter estimation, model diagnostics, and forecasting. The studied time series models are applied to the data of daily lake level in Beaver Lake (1988-2017) and the data of daily maximum temperature in New York Central Park (1870-2017). Due to seasonality of the data, three different approaches are proposed to the modeling: regression method, functional ARIMA method and multiplicative seasonal ARIMA method. The forecasted values of the year 2018 are compared …


Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller Nov 2019

Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller

LSU Doctoral Dissertations

Several of the northwestern Gulf of Mexico (GOM) shelf-edge banks provide critical hard bottom habitat for coral and fish communities, supporting a wide diversity of ecologically and economically important species. These sites may be fish aggregation and spawning sites and provide important habitat for fish growth and reproduction. Already designated as habitat areas of particular concern, many of these banks are also under consideration for inclusion in the expansion of the Flower Garden Banks National Marine Sanctuary. This project aimed to gain a more comprehensive understanding of the communities and fish species on shelf-edge banks by way of gonad histology, …


Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari Oct 2019

Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari

Electronic Thesis and Dissertation Repository

In western Canada, there has been an increase in seismic activity linked to anthropogenic energy-related operations including conventional hydrocarbon production, wastewater fluid injection and more recently hydraulic fracturing (HF). Statistical modeling and characterization of the space, time and magnitude distributions of the seismicity clusters is vital for a better understanding of induced earthquake processes and development of predictive models. In this work, a statistical analysis of the seismicity in the Western Canada Sedimentary Basin was performed across past and present time periods by utilizing a compiled earthquake catalogue for Alberta and eastern British Columbia. Specifically, the frequency-magnitude statistics were analyzed …


Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson Sep 2019

Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson

Theses and Dissertations

This research centers on finding the statistical moments, network measures, and statistical tests that are most sensitive to various node degradations for the Barabási-Albert, Erdös-Rényi, and Watts-Strogratz network models. Thirty-five different graph structures were simulated for each of the random graph generation algorithms, and sensitivity analysis was undertaken on three different network measures: degree, betweenness, and closeness. In an effort to find the statistical moments that are the most sensitive to degradation within each network, four traditional moments: mean, variance, skewness, and kurtosis as well as three non-traditional moments: L-variance, L-skewness, and L-kurtosis were examined. Each of these moments were …


Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez Sep 2019

Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez

Theses and Dissertations

Hierarchical Linear Models (HLMs), also known as multi-level models, are an extension of multiple regression analysis and can aid in the understanding of human and machine workloads of a system. These models allow for prediction and testing in systems with hierarchies of two or more levels. The complex interrelated variability of these multi-level models exists in operational settings, such as the Air Force Distributed Common Ground System Full Motion Video (AF DCGS FMV) community which is composed of individuals (Level-1), groups (Level-2), units (Level-3), and organizations (Level-4). Through the development of sample size requirements and considerations for multi-level models, this …


Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk Sep 2019

Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk

Dissertations, Theses, and Capstone Projects

This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision …


Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku Aug 2019

Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku

Master of Science in Computer Science Theses

Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir Aug 2019

Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir

Theses and Dissertations

This thesis conducts Value at Risk (VaR) and Expected Shortfall (ES) estimation using GARCH modeling and Bayesian Model Averaging (BMA). BMA considers multiple models weighted by some information criterion. Through BMA, this thesis finds that VaR and ES estimates can be improved through enhanced modeling of the data generation process.


Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra Aug 2019

Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra

University of New Orleans Theses and Dissertations

Proteins are an important component of living organisms, composed of one or more polypeptide chains, each containing hundreds or even thousands of amino acids of 20 standard types. The structure of a protein from the sequence determines crucial functions of proteins such as initiating metabolic reactions, DNA replication, cell signaling, and transporting molecules. In the past, proteins were considered to always have a well-defined stable shape (structured proteins), however, it has recently been shown that there exist intrinsically disordered proteins (IDPs), which lack a fixed or ordered 3D structure, have dynamic characteristics and therefore, exist in multiple states. Based on …


Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu Aug 2019

Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu

Electronic Thesis and Dissertation Repository

Mark-recapture studies are often used to estimate the survival of individuals in a population and identify factors that affect survival in order to understand how the population might be affected by changing conditions. Factors that vary between individuals and over time, like body mass, present a challenge because they can only be observed when an individual is captured. Several models have been proposed to deal with the missing-covariate problem and commonly impose a logit link function which implies that the survival probability varies between 0 and 1. In this thesis I explore the estimability of four possible models when survival …


Split Credibility: A Two-Dimensional Semi-Linear Credibility Model, Jingbing Qiu Aug 2019

Split Credibility: A Two-Dimensional Semi-Linear Credibility Model, Jingbing Qiu

Electronic Thesis and Dissertation Repository

In the thesis, we introduce a two-dimensional semi-linear credibility model, which is an extension of the classical credibility or split credibility models used by practicing actuaries. Our model predicts the future expected losses of a policyholder by considering its historical primary and excess losses. The optimal split point is derived based on the mean squared error criterion. We show when and why splitting a policyholder’s historical losses into primary and excess parts work analytically. In addition, we derived formulas for estimating our model parameters nonparametrically. Finally, we show the application of our model through three examples.


Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin Aug 2019

Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin

All NMU Master's Theses

The primary purpose of the current study was to investigate the effect of court location (distance and angle from basket) and shot types used on shot success in NCAA Men’s DI basketball during the 2017-18 season. A secondary purpose was to further expand the analysis based on two additional factors: player position (guard, forward, or center) and team ranking. All statistical analyses were completed in RStudio and three binomial logistic regression analyses were performed to evaluate factors that influence shot success; one for all two and three point shot attempts, one for only two point attempts, and one for only …


Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris Aug 2019

Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris

Graduate Theses and Dissertations

Our goal is to create spatio-temporal models for predicting future gubernatorial elections. For a concrete example of how well our models work we use past data to predict the 2018 Arkansas gubernatorial election and use the existing 2018 election data to check our models predictive accuracy. Gubernatorial election data was collected from the Arkansas Secretary of State website while related covariate data was collected from the website for the Federal Reserve Bank of St. Louis. The data we collect is on the county level. For predictive purposes we fit multiple models to the data using Markov chain Monte Carlo and …


Development Of A Statistical Shape-Function Model Of The Implanted Knee For Real-Time Prediction Of Joint Mechanics, Kalin Gibbons Aug 2019

Development Of A Statistical Shape-Function Model Of The Implanted Knee For Real-Time Prediction Of Joint Mechanics, Kalin Gibbons

Boise State University Theses and Dissertations

Outcomes of total knee arthroplasty (TKA) are dependent on surgical technique, patient variability, and implant design. Non-optimal design or alignment choices may result in undesirable contact mechanics and joint kinematics, including poor joint alignment, instability, and reduced range of motion. Implant design and surgical alignment are modifiable factors with potential to improve patient outcomes, and there is a need for robust implant designs that can accommodate patient variability. Our objective was to develop a statistical shape-function model (SFM) of a posterior stabilized implant knee to instantaneously predict output mechanics in an efficient manner. Finite element methods were combined with Latin …


Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu Aug 2019

Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu

Electronic Theses and Dissertations

An efficient way of analyzing survival clinical data such as cancer data is a great concern to health experts. In this study, we investigate and propose an efficient way of handling survival clinical data. Simulation studies were conducted to compare performances of various forms of survival model techniques using an R package ``survsim". Models performance was conducted with varying sample sizes as small ($n5000$). For small and mild samples, the performance of the semi-parametric outperform or approximate the performance of the parametric model. However, for large samples, the parametric model outperforms the semi-parametric model. We compared the effectiveness and reliability …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Detection Of Sand Boils From Images Using Machine Learning Approaches, Aditi S. Kuchi May 2019

Detection Of Sand Boils From Images Using Machine Learning Approaches, Aditi S. Kuchi

University of New Orleans Theses and Dissertations

Levees provide protection for vast amounts of commercial and residential properties. However, these structures degrade over time, due to the impact of severe weather, sand boils, subsidence of land, seepage, etc. In this research, we focus on detecting sand boils. Sand boils occur when water under pressure wells up to the surface through a bed of sand. These make levees especially vulnerable. Object detection is a good approach to confirm the presence of sand boils from satellite or drone imagery, which can be utilized to assist in the automated levee monitoring methodology. Since sand boils have distinct features, applying object …


Advances In Measurement Error Modeling, Linh Nghiem May 2019

Advances In Measurement Error Modeling, Linh Nghiem

Statistical Science Theses and Dissertations

Measurement error in observations is widely known to cause bias and a loss of power when fitting statistical models, particularly when studying distribution shape or the relationship between an outcome and a variable of interest. Most existing correction methods in the literature require strong assumptions about the distribution of the measurement error, or rely on ancillary data which is not always available. This limits the applicability of these methods in many situations. Furthermore, new correction approaches are also needed for high-dimensional settings, where the presence of measurement error in the covariates adds another level of complexity to the desirable structure …


Decision Trees And Their Application For Classification And Regression Problems, Obinna Chilezie Njoku May 2019

Decision Trees And Their Application For Classification And Regression Problems, Obinna Chilezie Njoku

MSU Graduate Theses

Tree methods are some of the best and most commonly used methods in the field of statistical learning. They are widely used in classification and regression modeling. This thesis introduces the concept and focuses more on decision trees such as Classification and Regression Trees (CART) used for classification and regression predictive modeling problems. We also introduced some ensemble methods such as bagging, random forest and boosting. These methods were introduced to improve the performance and accuracy of the models constructed by classification and regression tree models. This work also provides an in-depth understanding of how the CART models are constructed, …


Paper Structure Formation Simulation, Tyler R. Seekins May 2019

Paper Structure Formation Simulation, Tyler R. Seekins

Electronic Theses and Dissertations

On the surface, paper appears simple, but closer inspection yields a rich collection of chaotic dynamics and random variables. Predictive simulation of paper product properties is desirable for screening candidate experiments and optimizing recipes but existing models are inadequate for practical use. We present a novel structure simulation and generation system designed to narrow the gap between mathematical model and practical prediction. Realistic inputs to the system are preserved as randomly distributed variables. Rapid fiber placement (~1 second/fiber) is achieved with probabilistic approximation of chaotic fluid dynamics and minimization of potential energy to determine flexible fiber conformations. Resulting digital packed …


Analyzing Two-Year College Student Success Using Structural Equation Modeling, Jessica Taylor May 2019

Analyzing Two-Year College Student Success Using Structural Equation Modeling, Jessica Taylor

Graduate Theses, Dissertations, and Capstones

The goal of this study is to more fully understand the scope of community college student success using the principles of mindset, engagement, and college readiness. Using structural equation modeling ensures this study is able to measure the combined effects these concepts have on student success, group differences, and the combined model of student success. Findings suggest student success can be significantly impacted by self-belief and mindset behaviors that can outweigh the initial effect of academically under-prepared students. Groups included in this study are non-traditional students, minority populations, first generation students, and Pell eligible students.


Bias Reduction In Machine Learning Classifiers For Spatiotemporal Analysis Of Coral Reefs Using Remote Sensing Images, Justin J. Gapper May 2019

Bias Reduction In Machine Learning Classifiers For Spatiotemporal Analysis Of Coral Reefs Using Remote Sensing Images, Justin J. Gapper

Computational and Data Sciences (PhD) Dissertations

This dissertation is an evaluation of the generalization characteristics of machine learning classifiers as applied to the detection of coral reefs using remote sensing images. Three scientific studies have been conducted as part of this research: 1) Evaluation of Spatial Generalization Characteristics of a Robust Classifier as Applied to Coral Reef Habitats in Remote Islands of the Pacific Ocean 2) Coral Reef Change Detection in Remote Pacific Islands using Support Vector Machine Classifiers 3) A Generalized Machine Learning Classifier for Spatiotemporal Analysis of Coral Reefs in the Red Sea. The aim of this dissertation is to propose and evaluate a …


Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark May 2019

Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Filtered historical simulation with an underlying GARCH process can be used as a valuable tool in VaR analysis, as it derives risk estimates that are sensitive to the distributional properties of the historical data of the produced predictive density. I examine the applications to risk analysis that filtered historical simulation can provide, as well as an interpretation of the predictive density as a poor man’s Bayesian posterior distribution. The predictive density allows us to make associated probabilistic statements regarding the results for VaR analysis, giving greater measurement of risk and the ability to maintain the optimal level of risk per …


A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong May 2019

A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong

Graduate Theses and Dissertations

Because earthquakes have a large impact on human society, statistical methods for better studying earthquakes are required. One characteristic of earthquakes is the arrival time of seismic waves at a seismic signal sensor. Once we can estimate the earthquake arrival time accurately, the earthquake location can be triangulated, and assistance can be sent to that area correctly. This study presents a Bayesian framework to predict the arrival time of seismic waves with associated uncertainty. We use a change point framework to model the different conditions before and after the seismic wave arrives. To evaluate the performance of the model, we …


Effects Of Perioperative Hyperglycemia In Patients With Diabetes Compared To Patients Without Diabetes: A Retrospective Study Of Treatment And Outcomes, Matthew Anderson May 2019

Effects Of Perioperative Hyperglycemia In Patients With Diabetes Compared To Patients Without Diabetes: A Retrospective Study Of Treatment And Outcomes, Matthew Anderson

Capstone Experience

The main goal of this project was to examine the differences in perioperative hyperglycemia treatment received by patients with a diagnosis of diabetes mellitus (DM) and patients without a diagnosis of diabetes (NDM); and how these treatment differences can affect the length of hospital stay. Studies have revealed that, when comparing DM and NDM patients with the same degree of perioperative hyperglycemia, NDM patients suffer worse outcomes. It has been suggested in previous research that this may be because NDM patients receive treatment that does not measure up to the standard of care treatment that DM patients receive. In this …


Comparing Elo, Glicko, Irt, And Bayesian Irt Statistical Models For Educational And Gaming Data, Breanna Morrison May 2019

Comparing Elo, Glicko, Irt, And Bayesian Irt Statistical Models For Educational And Gaming Data, Breanna Morrison

Graduate Theses and Dissertations

Statistical models used for estimating skill or ability levels often vary by field, however their underlying mathematical models can be very similar. Differences in the underlying models can be due to the need to accommodate data with different underlying formats and structure. As the models from varying fields increase in complexity, their ability to be applied to different types of data may have the ability to increase. Models that are applied to educational or psychological data have advanced to accommodate a wide range of data formats, including increased estimation accuracy with sparsely populated data matrices. Conversely, the field of online …


A Data-Driven Approach For Modeling Agents, Hamdi Kavak Apr 2019

A Data-Driven Approach For Modeling Agents, Hamdi Kavak

Computational Modeling & Simulation Engineering Theses & Dissertations

Agents are commonly created on a set of simple rules driven by theories, hypotheses, and assumptions. Such modeling premise has limited use of real-world data and is challenged when modeling real-world systems due to the lack of empirical grounding. Simultaneously, the last decade has witnessed the production and availability of large-scale data from various sensors that carry behavioral signals. These data sources have the potential to change the way we create agent-based models; from simple rules to driven by data. Despite this opportunity, the literature has neglected to offer a modeling approach to generate granular agent behaviors from data, creating …


Assessment And Correction Of Lidar-Derived Dems In The Coastal Marshes Of Louisiana, William M. Lauve Mar 2019

Assessment And Correction Of Lidar-Derived Dems In The Coastal Marshes Of Louisiana, William M. Lauve

LSU Master's Theses

The onset of airborne light detection and ranging (lidar) has resulted in expansive, precise digital elevation models (DEMs). DEMs are essential for modeling complex systems, such as the coastal land margin of Louisiana. They are used for many applications (e.g. tide, storm surge, and ecological modeling) and by diverse groups (e.g. state and federal agencies, NGOs, and academia). However, in a marsh environment, it is difficult for airborne lidar to produce accurate bare-earth measurements and even accurate elevations are rarely verified by ground truth data. The accuracy of lidar in marshes is limited by the sensor’s resolution …