Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun Dec 2020

Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun

Electronic Theses and Dissertations

This dissertation consists of three projects related to Modified-Half-Normal distribution and causal inference. In my first project, a new distribution called Modified-Half-Normal distribution was introduced. I explored a few of its distributional properties, the procedures for generating random samples based on Bayesian approaches, and the parameter estimation based on the method of moments. The second project deals with the problem of selection bias of average treatment effect (ATE) if we use the observational data. I combined the propensity score based inverse probability of treatment weighting (IPTW) method and the directed acyclic graph (DAG) to solve this problem. The third project …


Multimodal Data Fusion And Attack Detection In Recommender Systems, Mehmet Aktukmak Nov 2020

Multimodal Data Fusion And Attack Detection In Recommender Systems, Mehmet Aktukmak

USF Tampa Graduate Theses and Dissertations

The commercial platforms that use recommender systems can collect relevant information to produce useful recommendations to the platform users. However, these sources usually contain missing values, imbalanced and heterogeneous data, and noisy observations. Such characteristics render the process of exploiting the information nontrivial, as one should carefully address them during the data fusion process. In addition to the degenerative characteristics, some entries can be fake, i.e., they can be the outcomes of malicious intents to manipulate the system. These entries should be eliminated before incorporation to any recommendation task. Detecting such malicious attacks quickly and accurately and then mitigating them …


Bayesian Topological Machine Learning, Christopher A. Oballe Aug 2020

Bayesian Topological Machine Learning, Christopher A. Oballe

Doctoral Dissertations

Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques …


Analyzing The Fractal Dimension Of Various Musical Pieces, Nathan Clark Aug 2020

Analyzing The Fractal Dimension Of Various Musical Pieces, Nathan Clark

Industrial Engineering Undergraduate Honors Theses

One of the most common tools for evaluating data is regression. This technique, widely used by industrial engineers, explores linear relationships between predictors and the response. Each observation of the response is a fixed linear combination of the predictors with an added error element. The method is built on the assumption that this error is normally distributed across all observations and has a mean of zero. In some cases, it has been found that the inherent variation is not the result of a random variable, but is instead the result of self-symmetric properties of the observations. For data with these …


The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung Jul 2020

The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung

Doctoral Dissertations

Mobile phones are widely adopted by users across the world today. However, the privacy implications of persistent connectivity are not well understood. This dissertation focuses on one important concern of mobile phone users: location privacy. I approach this problem from the perspective of three adversaries that users are exposed to via smartphone apps: the mobile advertiser, the app developer, and the cellular service provider. First, I quantify the proportion of mobile users who use location permissive apps and are able to be tracked through their advertising identifier, and demonstrate a mark and recapture attack that allows continued tracking of users …


At The Interface Of Algebra And Statistics, Tai-Danae Bradley Jun 2020

At The Interface Of Algebra And Statistics, Tai-Danae Bradley

Dissertations, Theses, and Capstone Projects

This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals …


Analysis Of Gameplay Strategies In Hearthstone: A Data Science Approach, Connor W. Watson May 2020

Analysis Of Gameplay Strategies In Hearthstone: A Data Science Approach, Connor W. Watson

Theses

In recent years, games have been a popular test bed for AI research, and the presence of Collectible Card Games (CCGs) in that space is still increasing. One such CCG for both competitive/casual play and AI research is Hearthstone, a two-player adversarial game where players seeks to implement one of several gameplay strategies to defeat their opponent and decrease all of their Health points to zero. Although some open source simulators exist, some of their methodologies for simulated agents create opponents with a relatively low skill level. Using evolutionary algorithms, this thesis seeks to evolve agents with a higher skill …


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen May 2020

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and economics …


Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …


Developing And Improving Risk Models Using Machine-Learning Based Algorithms, Yan Wang, Sherry Ni Jan 2020

Developing And Improving Risk Models Using Machine-Learning Based Algorithms, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

The objective of this study is to develop a good risk model for classifying business delinquency by simultaneously exploring several machine learning-based methods including regularization, hyperparameter optimization, and model ensembling algorithms. The rationale under the analyses is firstly to obtain good base binary classifiers (include Logistic Regression (LR), K-Nearest Neighbors (KNN ), Decision Tree (DT), and Artificial Neural Networks (ANN )) via regularization and appropriate settings of hyper-parameters. Then two model ensembling algorithms including bagging and boosting are performed on the good base classifiers for further model improvement. The models are evaluated using accuracy, Area Under the Receiver Operating Characteristic …


Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen Jan 2020

Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen

USF Tampa Graduate Theses and Dissertations

Cancer is one of the most deadly diseases that the world has been fighting against over decades. An enormous number of research has been conducted, via a wide scale of approaches, raging from genetic analysis to mathematical modeling. Survival analysis is a well-performed methodology frequently used to estimate the survival probability of a patient. Although there has been a large number of methods for survival analysis, efficient exploration of a high-dimensional feature space has been challenging due to its computational cost and complexity. This thesis adapts the component-wise gradient boosting algorithms for cancer survival analysis, and also proposes a new …


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


An Analysis Of The Success Of Farmers Markets In Kentucky Using Logistic Regression And Support Vector Machines, Jeron Russell Jan 2020

An Analysis Of The Success Of Farmers Markets In Kentucky Using Logistic Regression And Support Vector Machines, Jeron Russell

Mahurin Honors College Capstone Experience/Thesis Projects

The purpose of this research is to look at the relationship that market-specific, economic, and demographic variables have with the success of farmers markets in Kentucky. It additionally seeks to build a tool for predicting farmers market success that could be used by policy makers to aid in decision-making processes concerning farmers markets. Logistic regression and Support Vector Machines (SVMs) are used on data acquired from the Kentucky Department of Agriculture and the American Community Survey in order to analyze the data in a traditional statistical approach as well as a machine learning approach. The results included an SVM model …


Causal Effect Random Forest Of Interaction Trees For Learning Individualized Treatment Regimes In Observational Studies: With Applications To Education Study Data, Luo Li Jan 2020

Causal Effect Random Forest Of Interaction Trees For Learning Individualized Treatment Regimes In Observational Studies: With Applications To Education Study Data, Luo Li

CGU Theses & Dissertations

Learning individualized treatment regimes (ITR) using observational data holds great interest in various fields, as treatment recommendations based on individual characteristics may improve individual treatment benefits with a reduced cost. It has long been observed that different individuals may respond to a certain treatment with significant heterogeneity. ITR can be defined as a mapping between individual characteristics to a treatment assignment. The optimal ITR is the treatment assignment that maximizes expected individual treatment effects. Rooted from personalized medicine, many studies and applications of ITR are in medical fields and clinical practice. Heterogeneous responses are also well documented in educational interventions. …


Ai And Machine Learning Usage In Actuarial Science, Joanna Riley Jan 2020

Ai And Machine Learning Usage In Actuarial Science, Joanna Riley

Williams Honors College, Honors Research Projects

Some people in the world work hard and do whatever it takes in order to get a job that they love. There are others that don’t care about their jobs and solely perform them in order to make money. So, there are individuals or groups that wouldn’t care if a machine or computer were to replace them in their job, but others would be devastated. The question for this paper is: Can actuaries be completely replaced by computers, or do we need the human mind in order to make proper decisions and judgements?

Key words and phrases: actuarial science, artificial …