Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Bootstrap regression, Macroeconomic factors, U.S. home prices, Nonparametric methods. (1)
- CNN, SVM, LR, Brain Tumor Classification, MRI, Machine Learning (1)
- Cancer classifcation, gene expression data, RNA-Seq, machine learning, feature selection, dimensionality reduction, network analysis. (1)
- Cluster Analysis, K-means Clustering, Gaussian Mixture Models, AI-Driven Data Analysis, Adjusted Rand Index, Normalized Mutual Information Dimensionality Reduction in Clustering, Pattern Recognition (1)
- Credit Card, Decision Tree, Machine Learning (1)
-
- Cyberbullying detection, social media, machine learning, classification, feature extraction (1)
- Cyberbullying, social media, machine learning, classification, feature extraction (1)
- Decision Tree, Income, Classification (1)
- Genetic study, Single nucleotide polymorphism (SNP) markers, Lasso feature selection, RMSE (Root Mean Square Error) (1)
- Linear regression regularization, genome-wide association study (GWAS), LASSO (1)
- Logistic Regression, Multinomial Naive Bayes, KNearest Neighbor, Extreme Gradient Boosting, Bag of Words, Term Frequency-Inverse Document Frequency (1)
- Machine Learning, Regression, Life Expectancy, Predictive Analysis (1)
- Machine learning algorithms, Heart disease prediction, Decision tree algorithms, UCI Machine Learning Repository, 5-fold cross-validation (1)
- Matrix Factorization, Recommendation System, Movie (1)
- Multiple Linear Regression, Lasso Regression, Extreme Gradient Boosting (XGBoost) (1)
- NLP, Natural Language Processing, Cyberbullying, Twitter, classifcation, TF-IDF, bag-of-words (1)
- Recommender System, Matrix Factorization, Sparse matrix, Predicting Movie Rating, MovieLens Dataset (1)
- SNPs, maize, regularization, crossing, Lasso, Ridge, Elastic Net (1)
- Superconducting critical temperature, multiple regression, gradient-boosted model (1)
- Superconductor, Linear Regression, Critical Temperature (1)
- Superconductor, critical temperature, regression, linear regression (1)
- Variable Selection, High Dimensional Data, Lasso Regression, Elastic Net Regression, Overfitting (1)
- XGBoost, SteamGaming, Regression Analysis (1)
Articles 1 - 25 of 25
Full-Text Articles in Physical Sciences and Mathematics
Cyberbullying Detection On Twitter Data Using Machine Learning Classifiers, Pradip Dhakal
Cyberbullying Detection On Twitter Data Using Machine Learning Classifiers, Pradip Dhakal
Data Science and Data Mining
This study compares some of the popular machine learning techniques like Logistic Regression, Multinomial Naive Bayes, K-Nearest Neighbor, and Extreme Gradient Boosting to classify the tweets into three different categories: cyberbullying based on religion, cyberbullying based on ethnicity, or no cyberbullying. First, various data-cleaning approaches are used to clean the tweet data. After the data is clean and ready, the word embedding techniques, such as a bag of words and term frequency-Inverse document frequency, are used to convert the words into mathematical vectors. Finally, the model will be fitted using the combination of the above-mentioned word embedding techniques and machine …
Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe
Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe
Data Science and Data Mining
This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.
Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba
Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba
Data Science and Data Mining
This study delves into the classifcation of various cancer types using the RNA-Seq (HiSeq) PANCAN dataset from the UCI Machine Learning Repository, which encompasses a rich collection of gene expression data across multiple tumor samples. To improve cancer diagnosis and treatment, our methodology confronts the challenges inherent in high-dimensional datasets, such as the Hughes Effect and the Curse of Dimensionality, through innovative feature selection methods and machine learning approaches. A key component of our strategy includes the use of tree-based algorithms, particularly Random Forest, to refine the dataset to seventy genes of utmost relevance for tumor classifcation, and the application …
Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh
Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh
Data Science and Data Mining
The popularity of the electronic mobile devices along with social media as well as networking websites have been tremendously increased in the recent year. Most people around the world daily engage in the variety of cyberspace additives. Even though the users can take most advantages of these system such as exchange the idea and information, being sociable, and enjoyments, they might be faced with such adverse behaviors such as toxicity, bullying, extremism, and cruelty. The recent statistics reports that such mentioned behaviors has been noticeably grown on the cyberspace such that can threaten the individuals and even any community. Thus, …
Diagnostic In Neuroimaging: A Comparative Study Of Deep Learning And Traditional Approaches, Amina Issoufou Anaroua
Diagnostic In Neuroimaging: A Comparative Study Of Deep Learning And Traditional Approaches, Amina Issoufou Anaroua
Data Science and Data Mining
In the realm of medical diagnostics, precise classification of brain tumors is pivotal. This study conducts a comprehensive comparative analysis of a Convolutional Neural Network (CNN) against traditional machine learning models, Logistic Regression (LR) and Support Vector Machines (SVM) on a dataset of MRI scans for multi-class brain tumor classification. The CNN, tailored for image recognition, is evaluated alongside LR and SVM, which have established benchmarks in classification tasks. The investigation reveals that the traditional models hold their ground in terms of precision and interpretability, with the SVM, in particular, achieving remarkable accuracy. However, the CNN distinguishes itself by demonstrating …
Optimizing Ai With Advanced Data Structuring: A Comparative Analysis Of K-Means And Gmm Clustering Techniques, Amir Alipour Yengejeh
Optimizing Ai With Advanced Data Structuring: A Comparative Analysis Of K-Means And Gmm Clustering Techniques, Amir Alipour Yengejeh
Data Science and Data Mining
This study presents a detailed comparison of Kmeans and Gaussian Mixture Model (GMM) clustering algorithms, illustrating their unique capabilities and limitations across various synthetic datasets. By utilizing metrics such as the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), the research provides nuanced insights into how these algorithms handle datasets with varying structures and complexities. For instance, while both K-means and GMM show robust performance on well-separated clusters, GMM demonstrates a distinct advantage in scenarios with overlapping clusters or unbalanced data distributions. Conversely, K-means excels in identifying clear, distinct groupings, highlighting its utility in simpler clustering contexts. This study …
Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe
Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe
Data Science and Data Mining
Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …
Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen
Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen
Data Science and Data Mining
This project investigates game pricing strategies in the Steam market using an XGBoost model, drawing motivation from Professor Xie's lecture, and presenting findings through a density plot that delineates two primary pricing strategies. A free-to-play approach, indicated by a significant hot spot, is adopted by developers focusing on post-purchase revenues through DLC, aesthetic purchases, and in-game transactions. This sailing strategy includes community-centric developers aiming to distribute their games for player engagement rather than profit.
The project illustrates the effectiveness of advanced modeling techniques in handling complex datasets, with significant predictive accuracy reflected by a reduced MSE from 0.3472 to 0.1397. …
Predicting Road Accident Injury Severity For Drivers In Automobile Crashes In United States Using Machine Learning Models And Ai, Emil Agbemade, Benedict Kongyir
Predicting Road Accident Injury Severity For Drivers In Automobile Crashes In United States Using Machine Learning Models And Ai, Emil Agbemade, Benedict Kongyir
Data Science and Data Mining
This study analyzes data from the National Highway Trafc Safety Administration’s 2021 Crash Report Sampling System to identify key factors contributing to the severity of injuries in car accidents. By utilizing various machine learning algorithms and cross-validation techniques, we assessed metrics such as accuracy, sensitivity, precision, specifcity, and the area under the curve (AUC) to evaluate the efectiveness of predictive models. All data preprocessing and model building was done using KNIME Analytical software [9]. Our fndings reveal signifcant correlations between certain variables such as airbag injection, weather conditions, intoxication, vehicle state, driver distractions, and injury severity. These insights underscore the …
Bootstrap Regression For Investigating Macroeconomics Factors Affecting Usa Home Prices, Benedict Kongyir, Emil Agbemade
Bootstrap Regression For Investigating Macroeconomics Factors Affecting Usa Home Prices, Benedict Kongyir, Emil Agbemade
Data Science and Data Mining
This study investigates the impact of macroeconomic indicators on US home prices, underscoring the importance of understanding these dynamics due to their signifcant socioeconomic consequences. Utilizing a dataset from Kaggle, originally collected by FRED, the research examines variables like the Consumer Price Index, Population, Unemployment, GDP, Stock Prices, Income, and Mortgage Rate to discern their efect on housing market fuctuations. The analysis identifes multicollinearity among predictors, necessitating a shift from traditional multiple linear regression to a more robust bootstrap regression method due to violations of parametric assumptions. Key fndings reveal that Real Disposable Income is a signifcant predictor of home …
Modeling Health Insurance Premium Using Bayesian Hierarchical Models, Bennedict Kongyir, Emil Agbemade
Modeling Health Insurance Premium Using Bayesian Hierarchical Models, Bennedict Kongyir, Emil Agbemade
Data Science and Data Mining
Insurance pricing requires pragmatism and creativity due to the unpredictable nature of risk [3]. This paper explores Bayesian hierarchical models to model health insurance premiums using individual and group predictors like demographics, health status, and geography. Data from Kaggle on health insurance policyholders was utilized, with prior distributions enhancing model interpretability and credibility. Bayesian models improve predictive accuracy and provide valuable insights for actuaries and policymakers, highlighting the signifcant impact of factors such as age and BMI on premium pricing.
Linear Regression With Regularization On The Genetic Architecture Of Maize Flowering Time, Roland Fiagbe
Linear Regression With Regularization On The Genetic Architecture Of Maize Flowering Time, Roland Fiagbe
Data Science and Data Mining
Over a century, the maize crop has been one of the most important crop species that is targeted for genetic investigations and experiments. One of the major experiments that have been a topic of interest is crossing inbred lines to produce better offspring through a process called heterosis. Crossing the inbred lines create numerous SNP markers that determine the time to male flowering. This project seeks to explore the SNP markers to select the most relevant ones for predicting time to male flowering using linear regression with regularization methods due to the fact that p > n in our dataset. Various …
A Recommender System For Movie Ratings With Matrix Factorization Algorithm, Amir Alipour Yengejeh
A Recommender System For Movie Ratings With Matrix Factorization Algorithm, Amir Alipour Yengejeh
Data Science and Data Mining
Nowadays, a Recommender System is a technology
that aims to predict preferences based on the user’s selections.
These systems are applied in numerous fields, such as movies,
music, news, books, research articles, search queries, social tags,
and various products. In this study, we use this potential tool to
predict the ratings of users’ preferences in MovieLens datasets. To
do so, we applied the matrix factorization algorithm and calculate
the RMSE as our evaluation metric. The results represent that
RMSE estimated for the train and test set are 0.83 and 0.93 that
are very close one another. This results indicates that …
Genome-Wide Association Study Of The Maize Crop By The Lasso Regression Analysis, Amir Alipour Yengejeh
Genome-Wide Association Study Of The Maize Crop By The Lasso Regression Analysis, Amir Alipour Yengejeh
Data Science and Data Mining
The accurate estimation of the male flowering period in Maize crops is key for the prediction crop fertility. The recent scientific investigations has shown that the genetic single nucleotic polymorphism (SNP) can contribute in this regard. The genomewide association study (GWAS) is employed to generate these attributes (SNP). But it caused a high-dimensional data in which 4,981 observations with 7,389 SNP attributes. Hence, in this study, we used the penalized regression approach with the least absolute shrinkage and selection operator (Lasso) to reduce the dataset. In this regard, we set the regularization parameter to 0.21. It resulted in a set …
Analysis Of Credit Approval By Decision Tree, Amir Alipour Yengejeh
Analysis Of Credit Approval By Decision Tree, Amir Alipour Yengejeh
Data Science and Data Mining
Nowadays, machine learning algorithms are com-
monly used by the financial institutions or bankers to evaluate
the applications’ requires for credit card. In this study, we used
the decision tree algorithm to predict credit card approval based
on the other associated features applicants like age, employment
status, Education Level, etc. Our results shows that the applicants’
Prior Default and Debt, and Employed have more contribution
in the credit card approval.
Movie Recommender System Using Matrix Factorization, Roland Fiagbe
Movie Recommender System Using Matrix Factorization, Roland Fiagbe
Data Science and Data Mining
Recommendation systems are a popular and beneficial field that can help people make informed decisions automatically. This technique assists users in selecting relevant information from an overwhelming amount of available data. When it comes to movie recommendations, two common methods are collaborative filtering, which compares similarities between users, and content-based filtering, which takes a user’s specific preferences into account. However, our study focuses on the collaborative filtering approach, specifically matrix factorization. Various similarity metrics are used to identify user similarities for recommendation purposes. Our project aims to predict movie ratings for unwatched movies using the MovieLens rating dataset. We developed …
Classification Of Adult Income Using Decision Tree, Roland Fiagbe
Classification Of Adult Income Using Decision Tree, Roland Fiagbe
Data Science and Data Mining
Decision tree is a commonly used data mining methodology for performing classification tasks. It is a tree-based supervised machine learning algorithm that is used to classify or make predictions in a path of how previous questions are answered. Generally, the decision tree algorithm categorizes data into branch-like segments that develop into a tree that contains a root, nodes, and leaves. This project seeks to explore the decision tree methodology and apply it to the Adult Income dataset from the UCI Machine Learning Repository, to determine whether a person makes over 50K per year and determine the necessary factors that improve …
A Linear Regression Model To Predict The Critical Temperature Of A Superconductor, Amir Alipour Yengejeh
A Linear Regression Model To Predict The Critical Temperature Of A Superconductor, Amir Alipour Yengejeh
Data Science and Data Mining
Since the superconductivity has been introduced, almost all studies in this area have been striving to predict the critical temperature ($T_{c}$) through the features extracted from the superconductor's chemical formula. In this study, thus, we are interested in exploring the linear association between $T_{c}$ and the related features.
Variable Selection Using Lasso And Elastic Net Regression On High Dimensional Genetic Architecture Data Of Maize Flowering Time, Pradip Dhakal
Variable Selection Using Lasso And Elastic Net Regression On High Dimensional Genetic Architecture Data Of Maize Flowering Time, Pradip Dhakal
Data Science and Data Mining
Variable selection is one of the key components in the machine learning area. This method reduces the unwanted and redundant predictors in the model, which prevents the overfitting situation. Since the model contains few significant predictors, the model is less likely to learn the trend from the noise. Further, the time to train the model reduces when we have only a few valuable variables.
Variable Selection And Regression Analysis, Emil Agbemade
Variable Selection And Regression Analysis, Emil Agbemade
Data Science and Data Mining
One of the most valuable crop species, maize, has been the subject of genetic study and experimentation for more than a century. However, species that share similarities and differences across a wide spectrum have developed astonishing adaptations as a result of small changes throughout time. Because it is usual practice to determine the genotypes of thousands of single nucleotide polymorphism (SNP) markers for thousands of patients, the data set we are dealing with has an issue with small n and large p. The result of this is that there are noticeably more predictor factors than responder variables. The original data …
Developing A Data-Driven Statistical Model For Accurately Predicting The Superconducting Critical Temperature Of Materials Using Multiple Regression And Gradient-Boosted Methods, Emil Agbemade
Data Science and Data Mining
This study focuses on developing a statistical model for estimating the superconducting critical temperature (Tc) of materials using a data-driven strategy. The study analyzed 21,263 superconductors and used a combination of multiple regression and gradient-boosted models to make predictions. The analysis included a descriptive analysis of the distribution of Tc, feature selection using the Backwards selection method, and model diagnostics. The results showed that the gradient-boosted method outperformed the multiple linear regression method with an RMSE of 12.01 and an R2 value of 88.23 after fine-tuning its hyperparameters. The study concludes that the gradient-boosted method is an effective approach …
Machine Learning-Based Approaches For Predicting The Critical Temperature Of Superconductor, Pradip Dhakal
Machine Learning-Based Approaches For Predicting The Critical Temperature Of Superconductor, Pradip Dhakal
Data Science and Data Mining
This paper focuses on utilizing multiple linear regression, lasso regression, and extreme gradient boosting algorithms to predict the critical temperature of the superconductor. The model will be evaluated using the mean square error and adjusted R-squared values, and the best model will be recommended for future work related to this study.
Predicting Heart Disease Using Tree-Based Model, Emil Agbemade
Predicting Heart Disease Using Tree-Based Model, Emil Agbemade
Data Science and Data Mining
The paper presents a study on the use of machine learning algorithms for the prediction of heart disease, which is the leading cause of death worldwide. The study focuses on the use of decision tree algorithms, which have the advantage of considering a large number of risk factors. The heart disease data set was obtained from the UCI Machine Learning Repository and was analyzed using a decision tree classifier. The data set had 6 missing data points, which were deleted, leaving 279 instances for analysis. One-hot-encoding was performed on categorical variables with more than two responses. The decision tree classifier …
Silent Agony: Automated Detection Of Ethnic And Religious Cyberbullying Using Machine Learning, Emil Agbemade
Silent Agony: Automated Detection Of Ethnic And Religious Cyberbullying Using Machine Learning, Emil Agbemade
Data Science and Data Mining
The use of electronic mobile devices, social media, and networking websites has increased tremendously in recent years. Despite the advantages of these systems, such as exchanging ideas and information, being sociable, and providing entertainment, users may encounter adverse behaviors like toxicity, bullying, extremism, and cruelty. The prevalence of such behaviors has grown significantly in cyberspace, posing a threat to individuals and communities. To address this issue, there is a high demand for automated cyberbullying detection systems. Machine learning algorithms have been widely used to build such systems by classifying and detecting cyberbullying. In this study, we employed popular machine learning …
Analyzing The Impact Of Health, Economic, And Demographic Factors On Life Expectancy: A Comparative Study Of Developed And Developing Countries, Mahyar Alinejad
Analyzing The Impact Of Health, Economic, And Demographic Factors On Life Expectancy: A Comparative Study Of Developed And Developing Countries, Mahyar Alinejad
Data Science and Data Mining
This study presents a comprehensive analysis of three prominent machine learning regression models—Random Forest, XGBoost, and Support Vector Machine (SVM)—in the context of predictive analysis. Leveraging a carefully curated dataset, we explore the impact of various hyperparameters on model performance through an exhaustive tuning process. The Random Forest and XGBoost models exhibit robust predictive capabilities, with the former revealing notable insights through feature importance visualization. Additionally, SVM, optimized via GridSearchCV, demonstrates competitive performance. Evaluation metrics, including Mean Squared Error and R-squared, facilitate a thorough comparison of model efficacy. Results highlight nuanced strengths and weaknesses, informing practitioners on the suitability of …