Open Access. Powered by Scholars. Published by Universities.®
- Discipline
Articles 1 - 2 of 2
Full-Text Articles in Statistical Models
Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni
Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni
Published and Grey Literature from PhD Candidates
We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross-validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are …
A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni
A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni
Published and Grey Literature from PhD Candidates
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank …