Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Applied Statistics (8)
- Statistical Methodology (6)
- Computer Sciences (4)
- Biostatistics (3)
- Business (3)
-
- Theory and Algorithms (3)
- Artificial Intelligence and Robotics (2)
- Data Science (2)
- Life Sciences (2)
- Medicine and Health Sciences (2)
- Multivariate Analysis (2)
- Numerical Analysis and Scientific Computing (2)
- Analysis (1)
- Applied Mathematics (1)
- Bioinformatics (1)
- Biology (1)
- Business Analytics (1)
- Cancer Biology (1)
- Categorical Data Analysis (1)
- Cell and Developmental Biology (1)
- Disease Modeling (1)
- Diseases (1)
- Economics (1)
- Education (1)
- Engineering (1)
- Finance and Financial Management (1)
- Health and Physical Education (1)
- Keyword
-
- Deep Learning (2)
- Logistic regression (2)
- Risk modeling (2)
- Algorhithms (1)
- Bankcard response modeling (1)
-
- Bayesian tree-structured Parzen estimator (1)
- CHAID (1)
- Competition (1)
- Costs (1)
- Credit risk modeling (1)
- Credit scoring (1)
- Diabetes (1)
- Disease classification (1)
- Ensemble (1)
- Error Calculation (1)
- Extreme gradient boosting (1)
- Feature s (1)
- Feature selection (1)
- Firms (1)
- Graph theory (1)
- Hybrid (1)
- Hybrid model (1)
- Imbalance (1)
- Investment decision (1)
- Japanese direct-investment (1)
- Loss Function (1)
- Machine Learning (1)
- Machine learning (1)
- Matched-pair analysis (1)
- Models (1)
- Publication
- Publication Type
- File Type
Articles 1 - 13 of 13
Full-Text Articles in Statistical Models
Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists, Graham Nash
Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists, Graham Nash
Symposium of Student Scholars
Employee attrition is a relevant issue that every business employer must consider when gauging the effectiveness of their employees. Whether or not an employee chooses to leave their job can come from a multitude of factors. As a result, employers need to develop methods in which they can measure attrition by calculating the several qualities of their employees. Factors like their age, years with the company, which department they work in, their level of education, their job role, and even their marital status are all considered by employers to assist in predicting employee attrition. This project will be analyzing a …
Determining Malignancy: Can Mammogram Results Help Predict The Diagnosis Of Breast Tumors?, Taylor Behrens
Determining Malignancy: Can Mammogram Results Help Predict The Diagnosis Of Breast Tumors?, Taylor Behrens
Symposium of Student Scholars
Even with advancements in treatment and preventative care, breast cancer remains an epidemic claiming more than 40,000 American male and female lives each year. The mammogram dataset that I am analyzing was initially complied in the early 1990s by a team from the University of Wisconsin - Madison. Past research diagnoses breast cancer from fine-needle aspirates. My research focuses on predicting whether we can determine breast cancer diagnoses without the use of invasive procedures and, in particular, whether we can predict breast cancer based on mammogram data. Do measures of gray-scale texture, radius, concavity, perimeter, compactness, area, and smoothness of …
Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang
Doctor of Data Science and Analytics Dissertations
In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …
An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone
An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone
Published and Grey Literature from PhD Candidates
Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model …
A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone
A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone
Published and Grey Literature from PhD Candidates
We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simple neural network structure as the new feature construction tool in the first stage, then the newly created features are used as the additional input variables in logistic regression in the second stage. The model is compared with the traditional one-stage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under the ROC curve, and KS statistic. By creating new …
Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni
Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni
Published and Grey Literature from PhD Candidates
We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross-validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are …
A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni
A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni
Published and Grey Literature from PhD Candidates
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank …
Ordinal Hyperplane Loss, Bob Vanderheyden
Ordinal Hyperplane Loss, Bob Vanderheyden
Doctor of Data Science and Analytics Dissertations
This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize …
Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku
Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku
Master of Science in Computer Science Theses
Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …
Application Of Support Vector Machine Modeling And Graph Theory Metrics For Disease Classification, Jessica M. Rudd
Application Of Support Vector Machine Modeling And Graph Theory Metrics For Disease Classification, Jessica M. Rudd
Published and Grey Literature from PhD Candidates
Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network metrics can provide useful predictive information for disease modeling. In this study, we combine simulated social network metrics with SVM to predict diabetes in a sample of data from the Behavioral Risk Factor Surveillance System. In this dataset, Logistic Regression outperformed SVM with ROC index of 81.8 and 81.7 for models with …
Modeling Traffic At An Intersection, Kaleigh L. Mulkey, Saniita K. Fasenntao
Modeling Traffic At An Intersection, Kaleigh L. Mulkey, Saniita K. Fasenntao
Symposium of Student Scholars
The main purpose of this project is to build a mathematical model for traffic at a busy intersection. We use elements of Queueing Theory to build our model: the vehicles driving into the intersection are the “arrival process” and the stop light in the intersection is the “server.”
We collected traffic data on the number of vehicles arriving to the intersection, the duration of green and red lights, and the number of vehicles going through the intersection during a green light. We built a SAS macro code to simulate traffic based on parameters derived from the data.
In our program …
The Location Decisions Of Foreign Investors In China: Untangling The Effect Of Wages Using A Control Function Approach, Xuepeng Liu, Mary E. Lovely, Jan Ondrich
The Location Decisions Of Foreign Investors In China: Untangling The Effect Of Wages Using A Control Function Approach, Xuepeng Liu, Mary E. Lovely, Jan Ondrich
Faculty and Research Publications
There is almost no support for the proposition that capital is attracted to low wages from firm-level studies. We examine the location choices of 2,884 firms investing in China between 1993 and 1996 to offer two main contributions. First, we find that the location of labor-intensive activities is highly elastic to provincial wage differences. Generally, investors' wage sensitivity declines as the skill intensity of the industry increases. Second, we find that unobserved location-specific attributes exert a downward bias on estimated wage sensitivity. Using a control function approach, we estimate a downward bias of 50% to 90% in wage coefficients estimated …
Using Paired Comparison Matrices To Estimate Parameters Of The Partial Credit Rasch Measurement Model For Rater-Mediated Assessments, Mary Garner, George Engelhard Jr.
Using Paired Comparison Matrices To Estimate Parameters Of The Partial Credit Rasch Measurement Model For Rater-Mediated Assessments, Mary Garner, George Engelhard Jr.
Faculty and Research Publications
The purpose of this paper is to describe a technique for estimating the parameters of a Rasch model that accommodates ordered categories and rater severity. The technique builds on the conditional pairwise algorithm described by Choppin (1968, 1985) and represents an extension of a conditional algorithm described by Garner and Engelhard (2000, 2002) in which parameters appear as the eigenvector of a matrix derived from paired comparisons. The algorithm is used successfully to recover parameters from a simulated data set. No one has previously described such an extension of the pairwise algorithm to a Rasch model that includes both ordered …