Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Physical Sciences and Mathematics

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects Of Normalization Strategies, Jessica M. Rudd Jul 2020

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects Of Normalization Strategies, Jessica M. Rudd

Doctor of Data Science and Analytics Dissertations

Through a review of epistemological frameworks in social sciences, history of frameworks in statistics, as well as the current state of research, we establish that there appears to be no consistent, quantitatively motivated model development framework in data science, and the downstream analysis effects of various modeling choices are not uniformly documented. Examples are provided which illustrate that analytic choices, even if justifiable and statistically valid, have a downstream analysis effect on model results. This study proposes a unified model development framework that allows researchers to make statistically motivated modeling choices within the development pipeline. Additionally, a simulation study is …


Attack And Defense In Security Analytics, Yiyun Zhou May 2020

Attack And Defense In Security Analytics, Yiyun Zhou

Doctor of Data Science and Analytics Dissertations

The security problem has gained increasing awareness due to the various kinds of global threats. Security analytics is the process of using streaming data acquisition, collection, and artificial intelligence algorithms for security monitoring and threat disclosure. In this dissertation work, we utilize practical data-driven security analytics to identify the potential threat and explore the robustness of the machine learning model. We focus on two aspects: (1) Security Analytics: utilize machine learning and statistical analytics tools to identify and resolve the threat in real life, such as cybersecurity, abnormal activities. (2) Analytic Security: Explore the security issues of the machine learning …


Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …


A Credit Analysis Of The Unbanked And Underbanked: An Argument For Alternative Data, Edwin Baidoo Apr 2020

A Credit Analysis Of The Unbanked And Underbanked: An Argument For Alternative Data, Edwin Baidoo

Doctor of Data Science and Analytics Dissertations

The purpose of this study is to ascertain the statistical and economic significance of non-traditional credit data for individuals who do not have sufficient economic data, collectively known as the unbanked and underbanked. The consequences of not having sufficient economic information often determines whether unbanked and underbanked individuals will receive higher price of credit or be denied entirely. In terms of regulation, there is a strong interest in credit models that will inform policies on how to gradually move sections of the unbanked and underbanked population into the general financial network.

In Chapter 2 of the dissertation, I establish the …


A Novel Penalized Log-Likelihood Function For Class Imbalance Problem, Lili Zhang Mar 2020

A Novel Penalized Log-Likelihood Function For Class Imbalance Problem, Lili Zhang

Doctor of Data Science and Analytics Dissertations

The log-likelihood function is the optimization objective in the maximum likelihood method for estimating models (e.g., logistic regression, neural network). However, its formulation is based on assumptions that the target classes are equally distributed and the overall accuracy is maximized, which do not apply to class imbalance problems (e.g., fraud detection, rare disease diagnoses, customer conversion prediction, cybersecurity, predictive maintenance). When trained on imbalanced data, the resulting models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating such bias is to penalize the misclassification costs of observations differently …