Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Statistical Methodology

Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng Feb 2024

Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng

SDSU Data Science Symposium

Tornadoes are one of the nature’s most violent windstorms that can occur all over the world except Antarctica. Previous scientific efforts were spent on studying this nature hazard from facets such as: genesis, dynamics, detection, forecasting, warning, measuring, and assessing. While we want to model the tornado datasets by using modern sophisticated statistical and computational techniques. The goal of the paper is developing novel finite mixture models and performing clustering analysis on the spatial-temporal and intensity patterns of the tornadoes. To analyze the tornado dataset, we firstly try a Gaussian distribution with the mean vector and variance-covariance matrix represented as …


Session 6: The Size-Biased Lognormal Mixture With The Entropy Regularized Algorithm, Tatjana Miljkovic, Taehan Bae Feb 2024

Session 6: The Size-Biased Lognormal Mixture With The Entropy Regularized Algorithm, Tatjana Miljkovic, Taehan Bae

SDSU Data Science Symposium

A size-biased left-truncated Lognormal (SB-ltLN) mixture is proposed as a robust alternative to the Erlang mixture for modeling left-truncated insurance losses with a heavy tail. The weak denseness property of the weighted Lognormal mixture is studied along with the tail behavior. Explicit analytical solutions are derived for moments and Tail Value at Risk based on the proposed model. An extension of the regularized expectation–maximization (REM) algorithm with Shannon's entropy weights (ewREM) is introduced for parameter estimation and variability assessment. The left-truncated internal fraud data set from the Operational Riskdata eXchange is used to illustrate applications of the proposed model. Finally, …


Finite Mixture Modeling For Hierarchically Structured Data With Application To Keystroke Dynamics, Andrew Simpson, Semhar Michael Feb 2023

Finite Mixture Modeling For Hierarchically Structured Data With Application To Keystroke Dynamics, Andrew Simpson, Semhar Michael

SDSU Data Science Symposium

Keystroke dynamics has been used to both authenticate users of computer systems and detect unauthorized users who attempt to access the system. Monitoring keystroke dynamics adds another level to computer security as passwords are often compromised. Keystrokes can also be continuously monitored long after a password has been entered and the user is accessing the system for added security. Many of the current methods that have been proposed are supervised methods in that they assume that the true user of each keystroke is known apriori. This is not always true for example with businesses and government agencies which have internal …


Session 13: On Statistical Estimates Of The Inverted Kumaraswamy Distribution Under Adaptive Type-I Progressive Hybrid Censoring, Qingqing Li, Yuhlong Lio Feb 2022

Session 13: On Statistical Estimates Of The Inverted Kumaraswamy Distribution Under Adaptive Type-I Progressive Hybrid Censoring, Qingqing Li, Yuhlong Lio

SDSU Data Science Symposium

The probability distribution modeling is investigated via maximum likelihood estimation method based on adaptive type-I progressively hybrid censored samples from the inverted Kumaraswamy distribution. The point estimates of model parameters, reliability, hazard rate and quantile are obtained and confidence intervals are also developed by using asymptotic distribution as well as bootstrap method. Monte Carlo simulation has been performed to evaluate the accuracy of estimations. Finally, a real data set is given for the application illustration.


Session 11 - Methods: Bootstrap Control Chart For Pareto Percentiles, Ruth Burkhalter Feb 2020

Session 11 - Methods: Bootstrap Control Chart For Pareto Percentiles, Ruth Burkhalter

SDSU Data Science Symposium

Lifetime percentile is an important indicator of product reliability. However, the sampling distribution of a percentile estimator for any lifetime distribution is not a bell shaped one. As a result, the well-known Shewhart-type control chart cannot be applied to monitor the product lifetime percentiles. In this presentation, Bootstrap control charts based on maximum likelihood estimator (MLE) are proposed for monitoring Pareto percentiles. An intensive simulation study is conducted to compare the performance among the proposed MLE Bootstrap control chart and Shewhart-type control chart.


Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus Feb 2019

Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus

SDSU Data Science Symposium

Decision trees are a method commonly used in machine learning to either predict a categorical response or a continuous response variable. Once the tree partitions the space, the response is either determined by the majority vote – classification trees, or by averaging the response values – regression trees. This research builds a standard regression tree and then instead of averaging the responses, we train a neural network to determine the response value. We have found that our approach typically increases the predicative capability of the decision tree. We have 2 demonstrations of this approach that we wish to present as …


Session: 4 Multilinear Subspace Learning And Its Applications To Machine Learning, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr. Feb 2019

Session: 4 Multilinear Subspace Learning And Its Applications To Machine Learning, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr.

SDSU Data Science Symposium

Multi-dimensional data analysis has seen increased interest in recent years. With more and more data arriving as 2-dimensional arrays (images) as opposed to 1-dimensioanl arrays (signals), new methods for dimensionality reduction, data analysis, and machine learning have been pursued. Most notably have been the Canonical Decompositions/Parallel Factors (commonly referred to as CP) and Tucker decompositions (commonly regarded as a high order SVD: HOSVD). In the current research we present an alternate method for computing singular value and eigenvalue decompositions on multi-way data through an algebra of circulants and illustrate their application to two well-known machine learning methods: Multi-Linear Principal Component …


Approximate Bayesian Computation In Forensic Science, Jessie H. Hendricks Jan 2017

Approximate Bayesian Computation In Forensic Science, Jessie H. Hendricks

The Journal of Undergraduate Research

Forensic evidence is often an important factor in criminal investigations. Analyzing evidence in an objective way involves the use of statistics. However, many evidence types (i.e., glass fragments, fingerprints, shoe impressions) are very complex. This makes the use of statistical methods, such as model selection in Bayesian inference, extremely difficult.

Approximate Bayesian Computation is an algorithmic method in Bayesian analysis that can be used for model selection. It is especially useful because it can be used to assign a Bayes Factor without the need to directly evaluate the exact likelihood function - a difficult task for complex data. Several criticisms …