Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, 2023 Murray State University

#### Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress

*Honors College Theses*

Glaucoma is a group of eye diseases in which damage gradually occurs to the optic nerve, which often leads to partial or complete loss of vision. As the second leading cause of blindness, there is no cure for glaucoma. Early detection and the tracking of its progression is key to managing the effects of glaucoma. Ordinary Least Squares Regression (OLSR), the most commonly used methodology for tracking glaucoma progression, is inappropriate as the longitudinally collected perimetry data from the glaucoma patients appears to be temporally correlated. Time series models, that account for temporal correlation, are better methods to analyze Mean …

Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists, 2023 Kennesaw State University

#### Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists, Graham Nash

*Symposium of Student Scholars*

Employee attrition is a relevant issue that every business employer must consider when gauging the effectiveness of their employees. Whether or not an employee chooses to leave their job can come from a multitude of factors. As a result, employers need to develop methods in which they can measure attrition by calculating the several qualities of their employees. Factors like their age, years with the company, which department they work in, their level of education, their job role, and even their marital status are all considered by employers to assist in predicting employee attrition. This project will be analyzing a …

Using A Distributive Approach To Model Insurance Loss, 2023 University of Mary Washington

#### Using A Distributive Approach To Model Insurance Loss, Kayla Kippes

*Student Research Submissions*

Insurance loss is an unpredicted event that stands at the forefront of the insurance industry. Loss in insurance represents the costs or expenses incurred due to a claim. An insurance claim is a request for the insurance company to pay for damage caused to an individual’s property. Loss can be measured by how much money (the dollar amount) has been paid out by the insurance company to repair the damage or it can be measured by the number of claims (claim count) made to the insurance company. Insured events include property damage due to fire, theft, flood, a car accident, …

Statistical Approach To Quantifying Interceptability Of Interaction Scenarios For Testing Autonomous Surface Vessels, 2023 Old Dominion University

#### Statistical Approach To Quantifying Interceptability Of Interaction Scenarios For Testing Autonomous Surface Vessels, Benjamin E. Hargis, Yiannis E. Papelis

*Modeling, Simulation and Visualization Student Capstone Conference*

This paper presents a probabilistic approach to quantifying interceptability of an interaction scenario designed to test collision avoidance of autonomous navigation algorithms. Interceptability is one of many measures to determine the complexity or difficulty of an interaction scenario. This approach uses a combined probability model of capability and intent to create a predicted position probability map for the system under test. Then, intercept-ability is quantified by determining the overlap between the system under test probability map and the intruder’s capability model. The approach is general; however, a demonstration is provided using kinematic capability models and an odometry-based intent model.

Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, 2023 Southern Methodist University

#### Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater

*SMU Data Science Review*

A chasm exists between the active public equity investment management industry's fundamental, momentum, and quantitative styles. In this study, the researchers explore ways to bridge this gap by leveraging domain knowledge, fundamental analysis, momentum, crowdsourcing, and data science methods. This research also seeks to test the developed tools and strategies during the volatile time period of 2020 and 2021.

Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, 2023 Southern Methodist University

#### Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia

*SMU Data Science Review*

Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.

A New Generalized Gamma-Weibull Distribution And Its Applications, 2023 Department of Mathematics and Statistics, Kwara State University, Malete P.M.B. 1530, Ilorin, Nigeria

#### A New Generalized Gamma-Weibull Distribution And Its Applications, Nihimat Iyebuhola Aleshinloye, Samuel Adewale Aderoju, Alfred Adewole Abiodun, Bako Lukmon Taiwo

*Al-Bahir Journal for Engineering and Pure Sciences*

In this paper, a New Generalized Gamma-Weibull (NGGW) distribution is developed by compounding Weibull and generalized gamma distribution. Some mathematical properties such as moments, Rényi entropy and order statistics are derived and discussed. The maximum likelihood estimation (MLE) method is used to estimate the model parameters. The proposed model is applied to two real-life datasets to illustrate its performance and flexibility as compared to some other competing distributions. The results obtained show that the new distribution fits each of the data better than the other competing distributions.

That’S My Deity: An Examination Of Online Lokean Cultures Through Log-Linear Modeling, 2023 University of South Carolina - Columbia

#### That’S My Deity: An Examination Of Online Lokean Cultures Through Log-Linear Modeling, Mary Bernstein

*Senior Theses*

A rise in online religious communities and the growth of so-called ‘Old World’ religions are reflected in the internet’s subcultures of Neopaganism, a growing religious movement that has been documented in America since the 1960s. The religions under this umbrella movement vary drastically and include belief systems such as Wicca, Druidry, and deity worship. Belief systems under this movement lack the traditional hierarchy found in structured religion and lack a singular sacred text. As such, believers usually find and support one another not through a physical sacred place of meeting, but through an online community that acts as sacred space. …

Beyond Machine Learning: An Fmri Domain Adaptation Model For Multi-Study Integration, 2023 Louisiana State University and Agricultural and Mechanical College

#### Beyond Machine Learning: An Fmri Domain Adaptation Model For Multi-Study Integration, Lauryn Michelle Burleigh

*LSU Doctoral Dissertations*

Traditional machine learning analyses are challenging with functional magnetic

resonance imaging (fMRI) data, not only because of the amount of data that needs to be

collected, adding a particular challenge for human fMRI research, but also due to the change in

hypothesis being addressed with various analytical techniques. Domain adaptation is a type of

transfer learning, a step beyond machine learning which allows for multiple related, but not

identical, data to contribute to a model, can be beneficial to overcome the limitation of data

needed but may address different hypothesis questions than anticipated given the analysis

computation. This dissertation assesses …

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), 2023 Southern Methodist University

#### Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

*SMU Data Science Review*

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …

A Characterization Of Bias Introduced Into Forensic Source Identification When There Is A Subpopulation Structure In The Relevant Source Population., 2023 South Dakota State University

#### A Characterization Of Bias Introduced Into Forensic Source Identification When There Is A Subpopulation Structure In The Relevant Source Population., Dylan Borchert, Semhar Michael, Christopher Saunders

*SDSU Data Science Symposium*

In forensic source identification the forensic expert is responsible for providing a summary of the evidence that allows for a decision maker to make a logical and coherent decision concerning the source of some trace evidence of interest. The academic consensus is usually that this summary should take the form of a likelihood ratio (LR) that summarizes the likelihood of the trace evidence arising under two competing propositions. These competing propositions are usually referred to as the prosecution’s proposition, that the specified source is the actual source of the trace evidence, and the defense’s proposition, that another source in a …

Analyzing Relationships With Machine Learning, 2023 The Graduate Center, City University of New York

#### Analyzing Relationships With Machine Learning, Oscar Ko

*Dissertations, Theses, and Capstone Projects*

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …

Biasing Estimator To Mitigate Multicollinearity In Linear Regression Model, 2023 Department of Mathematics and Statistics, Federal University Wukari, Wukari, Nigeria

#### Biasing Estimator To Mitigate Multicollinearity In Linear Regression Model, Abdulrasheed Bello Badawaire, Issam Dawoud, Adewale Folaranmi Lukman, Victoria Laoye, Arowolo Olatunji

*Al-Bahir Journal for Engineering and Pure Sciences*

A new two-parameter estimator was developed to combat the threat of multicollinearity for the linear regression model. Some necessary and sufficient conditions for the dominance of the proposed estimator over ordinary least squares (OLS) estimator, ridge regression estimator, Liu estimator, KL estimator, and some two-parameter estimators are obtained in the matrix mean square error sense. Theory and simulation results show that, under some conditions, the proposed two-parameter estimator consistently dominates other estimators considered in this study. The real-life application result follows suit.

On Partially Observed Tensor Regression, 2023 University of Windsor

#### On Partially Observed Tensor Regression, Dinara Miftyakhetdinova

*Major Papers*

Tensor data is widely used in modern data science. The interest lies in identifying and characterizing the relationship between tensor datasets and external covariates. These datasets, though, are often incomplete. An efficient nonconvex alternating updating algorithm proposed by J. Zhou et al. in the paper "Partially Observed Dynamic Tensor Response Regression" provides a novel approach. The algorithm handles the problem of unobserved entries by solving an optimization problem of a loss function under the low-rankness, sparsity, and fusion constraints. This analysis aims to understand in detail the proposed algorithms and their theoretical proofs with, potentially, dropping some of the assumptions …

Uniformity Test Based On The Empirical Bernstein Distribution, 2023 University of Windsor

#### Uniformity Test Based On The Empirical Bernstein Distribution, Ran Sun

*Major Papers*

In this paper, we firstly review the origin of Bernstein polynomial and the various application of it. Then we review the importance of goodness-of-fit test, especially the uniformity test, and we examine lots of different test statistics proposed by far. After that we suggest two new statistics for testing the uniformity. These two statistics are based on Komogorov-Smirnov test type and Cramér-Von Mises test type, respectively. Also we embed Bernstein polynomial into those test type and take advantage of great approximation performance of this polynomial. Finally, we run a Monte-Carlo simulation to compare the performance of our statistics to those …

Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, 2023 University of Kentucky

#### Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, Dongying Zhan

*Theses and Dissertations--Statistics*

For modeling count data, the Conway-Maxwell-Poisson (CMP) distribution is a popular generalization of the Poisson distribution due to its ability to characterize data over- or under-dispersion. While the classic parameterization of the CMP has been well-studied, its main drawback is that it is does not directly model the mean of the counts. This is mitigated by using a mean-parameterized version of the CMP distribution. In this work, we are concerned with the setting where count data may be comprised of subpopulations, each possibly having varying degrees of data dispersion. Thus, we propose a finite mixture of mean-parameterized CMP distributions. An …

Carnivore And Ungulate Occurrence In A Fire-Prone Region, 2023 California State Polytechnic University Humboldt

#### Carnivore And Ungulate Occurrence In A Fire-Prone Region, Sara J. Moriarty-Graves

*Cal Poly Humboldt theses and projects*

Increasing fire size and severity in the western United States causes changes to ecosystems, species’ habitat use, and interspecific interactions. Wide-ranging carnivore and ungulate mammalian species and their interactions may be influenced by an increase in fire activity in northern California. Depending on the fire characteristics, ungulates may benefit from burned habitat due to an increase in forage availability, while carnivore species may be differentially impacted, but ultimately driven by bottom-up processes from a shift in prey availability. I used a three-step approach to estimate the single-species occupancy of four large mammal species: mountain lion (*Puma concolor*), coyote …

Classification Of Adult Income Using Decision Tree, 2023 University of Central Florida

#### Classification Of Adult Income Using Decision Tree, Roland Fiagbe

*Data Science and Data Mining*

Decision tree is a commonly used data mining methodology for performing classification tasks. It is a tree-based supervised machine learning algorithm that is used to classify or make predictions in a path of how previous questions are answered. Generally, the decision tree algorithm categorizes data into branch-like segments that develop into a tree that contains a root, nodes, and leaves. This project seeks to explore the decision tree methodology and apply it to the Adult Income dataset from the UCI Machine Learning Repository, to determine whether a person makes over 50K per year and determine the necessary factors that improve …

Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, 2023 University of Kentucky

#### Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan

*Theses and Dissertations--Statistics*

Neural networks have experienced widespread adoption and have become integral in cutting-edge domains like computer vision, natural language processing, and various contemporary fields. However, addressing the statistical aspects of neural networks has been a persistent challenge, with limited satisfactory results. In my research, I focused on exploring statistical intervals applied to neural networks, specifically confidence intervals and tolerance intervals. I employed variance estimation methods, such as direct estimation and resampling, to assess neural networks and their performance under outlier scenarios. Remarkably, when outliers were present, the resampling method with infinitesimal jackknife estimation yielded confidence intervals that closely aligned with nominal …

Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, 2023 Mansoura University

#### Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, Ahmed Galal Atia, Mahmoud Mansour, Rashad Mohamed El-Sagheer, B. S. El-Desouky

*Basic Science Engineering*

In this paper, Weibull-Linear Exponential distribution (WLED) has been investigated whether being it is a well-fit distribution to a clinical real data. These data represent the duration of remission achieved by a certain drug used in the treatment of leukemia for a group of patients. The statistical inference approach is used to estimate the parameters of the WLED through the set of the fitted data. The estimated parameters are utilized to evaluate the survival and hazard functions and hence assessing the treatment method through forecasting the duration of remission times of patients. A two-sample prediction approach has been applied to …