Physical Sciences and Mathematics | Open Access Articles

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects Of Normalization Strategies, Jessica M. Rudd Jul 2020

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects Of Normalization Strategies, Jessica M. Rudd

Doctor of Data Science and Analytics Dissertations

Through a review of epistemological frameworks in social sciences, history of frameworks in statistics, as well as the current state of research, we establish that there appears to be no consistent, quantitatively motivated model development framework in data science, and the downstream analysis effects of various modeling choices are not uniformly documented. Examples are provided which illustrate that analytic choices, even if justifiable and statistically valid, have a downstream analysis effect on model results. This study proposes a unified model development framework that allows researchers to make statistically motivated modeling choices within the development pipeline. Additionally, a simulation study is …

Go to article

Attack And Defense In Security Analytics, Yiyun Zhou May 2020

Attack And Defense In Security Analytics, Yiyun Zhou

Doctor of Data Science and Analytics Dissertations

The security problem has gained increasing awareness due to the various kinds of global threats. Security analytics is the process of using streaming data acquisition, collection, and artificial intelligence algorithms for security monitoring and threat disclosure. In this dissertation work, we utilize practical data-driven security analytics to identify the potential threat and explore the robustness of the machine learning model. We focus on two aspects: (1) Security Analytics: utilize machine learning and statistical analytics tools to identify and resolve the threat in real life, such as cybersecurity, abnormal activities. (2) Analytic Security: Explore the security issues of the machine learning …

Go to article

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …

Go to article

A Credit Analysis Of The Unbanked And Underbanked: An Argument For Alternative Data, Edwin Baidoo Apr 2020

A Credit Analysis Of The Unbanked And Underbanked: An Argument For Alternative Data, Edwin Baidoo

Doctor of Data Science and Analytics Dissertations

The purpose of this study is to ascertain the statistical and economic significance of non-traditional credit data for individuals who do not have sufficient economic data, collectively known as the unbanked and underbanked. The consequences of not having sufficient economic information often determines whether unbanked and underbanked individuals will receive higher price of credit or be denied entirely. In terms of regulation, there is a strong interest in credit models that will inform policies on how to gradually move sections of the unbanked and underbanked population into the general financial network.

In Chapter 2 of the dissertation, I establish the …

Go to article

The Expanded View Of Individualism And Collectivism: One, Two, Or Four Dimensions?, Jennifer L. Priestley, Kamal Fatehi, Gita Taasoobshirazi Apr 2020

The Expanded View Of Individualism And Collectivism: One, Two, Or Four Dimensions?, Jennifer L. Priestley, Kamal Fatehi, Gita Taasoobshirazi

Faculty and Research Publications

Recent research to analyze and discuss cultural differences has employed a combination of five major dimensions of individualism–collectivism, power distance, uncertainty avoidance, femininity– masculinity (gender role differentiation), and long-term orientation. Among these dimensions, individualism–collectivism has received the most attention. Chronologically, this cultural attribute has been regarded as one, then two, and more recently, four dimensions of horizontal and vertical individualism and collectivism. However, research on this issue has not been conclusive and some have argued against this expansion. The current study attempts to explain and clarify this discussion by using a shortened version of the scale developed by Singelis et …

Go to article

A Novel Penalized Log-Likelihood Function For Class Imbalance Problem, Lili Zhang Mar 2020

A Novel Penalized Log-Likelihood Function For Class Imbalance Problem, Lili Zhang

Doctor of Data Science and Analytics Dissertations

The log-likelihood function is the optimization objective in the maximum likelihood method for estimating models (e.g., logistic regression, neural network). However, its formulation is based on assumptions that the target classes are equally distributed and the overall accuracy is maximized, which do not apply to class imbalance problems (e.g., fraud detection, rare disease diagnoses, customer conversion prediction, cybersecurity, predictive maintenance). When trained on imbalanced data, the resulting models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating such bias is to penalize the misclassification costs of observations differently …

Go to article

Developing And Improving Risk Models Using Machine-Learning Based Algorithms, Yan Wang, Sherry Ni Jan 2020

Developing And Improving Risk Models Using Machine-Learning Based Algorithms, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

The objective of this study is to develop a good risk model for classifying business delinquency by simultaneously exploring several machine learning-based methods including regularization, hyperparameter optimization, and model ensembling algorithms. The rationale under the analyses is firstly to obtain good base binary classifiers (include Logistic Regression (LR), K-Nearest Neighbors (KNN ), Decision Tree (DT), and Artificial Neural Networks (ANN )) via regularization and appropriate settings of hyper-parameters. Then two model ensembling algorithms including bagging and boosting are performed on the good base classifiers for further model improvement. The models are evaluated using accuracy, Area Under the Receiver Operating Characteristic …

Go to article

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone Jan 2020

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model …

Go to article

A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone Jan 2020

A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simple neural network structure as the new feature construction tool in the first stage, then the newly created features are used as the additional input variables in logistic regression in the second stage. The model is compared with the traditional one-stage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under the ROC curve, and KS statistic. By creating new …

Go to article

Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni Jan 2020

Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross-validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are …

Go to article

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni Jan 2020

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank …

Go to article

Genetic Algorithm Guidance Of A Constraint Programming Solver For The Multiple Traveling Salesman Problem, Jessica M. Rudd, Andrew M. Henshaw, Lauren Staples, Sanjoosh Akkineni, Lin Li, Joe Demaio Jan 2020

Genetic Algorithm Guidance Of A Constraint Programming Solver For The Multiple Traveling Salesman Problem, Jessica M. Rudd, Andrew M. Henshaw, Lauren Staples, Sanjoosh Akkineni, Lin Li, Joe Demaio

Published and Grey Literature from PhD Candidates

This project developed a metaheuristic approach to the Multiple Traveling Salesman Problem that pairs a custom genetic algorithm with a conventional combinatorial optimization solver. This combined approach was used to build an optimal route for two popular radio show hosts to visit each of the 37 Atlanta area Jersey Mike's Subs in one day. This supported a fundraising eort to send children with chronic and terminal illnesses to Disney World through an organization called Bert's Big Adventure. Atlanta-area Jersey Mike's locations donated 100% of proceeds earned on this Day of Giving to Bert's Big Adventure. With the suggested route developed …

Go to article

Fusion-Net: Integration Of Dimension Reduction And Deep Learning Neural Network For Image Classification, Mohammad Masum, Philippe Laval Jan 2020

Fusion-Net: Integration Of Dimension Reduction And Deep Learning Neural Network For Image Classification, Mohammad Masum, Philippe Laval

Published and Grey Literature from PhD Candidates

Building a deep network using original digital images requires learning many parameters which may reduce the accuracy rates. The images can be compressed by using dimension reduction methods and extracted reduced features can be feeding into a deep network for classification. Hence, in the training phase of the network, the number of parameters will be decreased. Principal Component Analysis is a well-known dimension reduction technique that leverage orthogonal linear transformation of the original data. In this paper, we propose a neural network-based framework, named Fusion-Net, which implements PCA on an image dataset (CIFAR-10) and then a neural network applies on …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects Of Normalization Strategies, Jessica M. Rudd

Doctor of Data Science and Analytics Dissertations

Attack And Defense In Security Analytics, Yiyun Zhou

Doctor of Data Science and Analytics Dissertations

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

A Credit Analysis Of The Unbanked And Underbanked: An Argument For Alternative Data, Edwin Baidoo

Doctor of Data Science and Analytics Dissertations

The Expanded View Of Individualism And Collectivism: One, Two, Or Four Dimensions?, Jennifer L. Priestley, Kamal Fatehi, Gita Taasoobshirazi

Faculty and Research Publications

A Novel Penalized Log-Likelihood Function For Class Imbalance Problem, Lili Zhang

Doctor of Data Science and Analytics Dissertations

Developing And Improving Risk Models Using Machine-Learning Based Algorithms, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

Genetic Algorithm Guidance Of A Constraint Programming Solver For The Multiple Traveling Salesman Problem, Jessica M. Rudd, Andrew M. Henshaw, Lauren Staples, Sanjoosh Akkineni, Lin Li, Joe Demaio

Published and Grey Literature from PhD Candidates

Fusion-Net: Integration Of Dimension Reduction And Deep Learning Neural Network For Image Classification, Mohammad Masum, Philippe Laval

Published and Grey Literature from PhD Candidates