Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Classifying Imbalanced Financial Fraud Data Utilizing Enhanced Random Forest Algorithm, Charles Gardner Dec 2020

Classifying Imbalanced Financial Fraud Data Utilizing Enhanced Random Forest Algorithm, Charles Gardner

Master of Science in Computer Science Theses

Imbalanced datasets have been a unique challenge for machine learning, requiring specialized approaches to correctly classify the minority class. Financial fraud detection involves using highly imbalanced datasets with a class imbalance of up to .01% frauds to 99.99% regular transactions. It is essential to identify all frauds in financial fraud detection, even if some classifications' precision is low. I developed a random forest assembly that separates fraudulent transactions into tiers of precision. With this approach, 96% of fraudulent transactions are identified, showing an 8% increase in recall when compared to standard approaches. 59% of fraud classifications' precision increases by 10% …


Data Mining And Image Classification Using Genetic Programming, Mahsa Shokri Varniab Jul 2020

Data Mining And Image Classification Using Genetic Programming, Mahsa Shokri Varniab

Master of Science in Computer Science Theses

Genetic programming (GP), a capable machine learning and search method, motivated by Darwinian-evolution, is an evolutionary learning algorithm which automatically evolves computer programs in the form of trees to solve problems. This thesis studies the application of GP for data mining and image processing. Knowledge discovery and data mining have been widely used in business, healthcare, and scientific fields. In data mining, classification is supervised learning that identifies new patterns and maps the data to predefined targets. A GP based classifier is developed in order to perform these mappings. GP has been investigated in a series of studies to classify …


Quantitatively Motivated Model Development Framework: Downstream Analysis Effects Of Normalization Strategies, Jessica M. Rudd Jul 2020

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects Of Normalization Strategies, Jessica M. Rudd

Doctor of Data Science and Analytics Dissertations

Through a review of epistemological frameworks in social sciences, history of frameworks in statistics, as well as the current state of research, we establish that there appears to be no consistent, quantitatively motivated model development framework in data science, and the downstream analysis effects of various modeling choices are not uniformly documented. Examples are provided which illustrate that analytic choices, even if justifiable and statistically valid, have a downstream analysis effect on model results. This study proposes a unified model development framework that allows researchers to make statistically motivated modeling choices within the development pipeline. Additionally, a simulation study is …


Fusion-Net: Integration Of Dimension Reduction And Deep Learning Neural Network For Image Classification, Mohammad Masum, Philippe Laval Jan 2020

Fusion-Net: Integration Of Dimension Reduction And Deep Learning Neural Network For Image Classification, Mohammad Masum, Philippe Laval

Published and Grey Literature from PhD Candidates

Building a deep network using original digital images requires learning many parameters which may reduce the accuracy rates. The images can be compressed by using dimension reduction methods and extracted reduced features can be feeding into a deep network for classification. Hence, in the training phase of the network, the number of parameters will be decreased. Principal Component Analysis is a well-known dimension reduction technique that leverage orthogonal linear transformation of the original data. In this paper, we propose a neural network-based framework, named Fusion-Net, which implements PCA on an image dataset (CIFAR-10) and then a neural network applies on …