Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

California State University, San Bernardino

Theses/Dissertations

Machine Learning

Publication Year

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth May 2024

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth

Electronic Theses, Projects, and Dissertations

The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an …


General Population Projection Model With Census Population Data, Takenori Tsuruga Dec 2023

General Population Projection Model With Census Population Data, Takenori Tsuruga

Electronic Theses, Projects, and Dissertations

The US Census Bureau offers a wide range of data, and within this array, the American Community Survey 5-Year Estimate (ACS5) serves as a valuable resource for understanding the US population. This project embarks on an exploration of Machine Learning and the Software Development process with the goal of generating effective population projections from ACS5 data. The project aims to provide methods to make predictions for every city and town in the US, encompassing their total population and population divided into 5-year age groups. It's worth noting that while the generation of these projections is grounded in the generalized statistical …


Genetic Programming To Optimize Performance Of Machine Learning Algorithms On Unbalanced Data Set, Asitha Thumpati Aug 2023

Genetic Programming To Optimize Performance Of Machine Learning Algorithms On Unbalanced Data Set, Asitha Thumpati

Electronic Theses, Projects, and Dissertations

Data collected from the real world is often imbalanced, meaning that the distribution of data across known classes is biased or skewed. When using machine learning classification models on such imbalanced data, predictive performance tends to be lower because these models are designed with the assumption of balanced classes or a relatively equal number of instances for each class. To address this issue, we employ data preprocessing techniques such as SMOTE (Synthetic Minority Oversampling Technique) for oversampling data and random undersampling for undersampling data on unbalanced datasets. Once the dataset is balanced, genetic programming is utilized for feature selection to …


A Study Of Various Data Sizes Using Machine Learning, Sochaeta Koeum May 2023

A Study Of Various Data Sizes Using Machine Learning, Sochaeta Koeum

Electronic Theses, Projects, and Dissertations

Social media is a great domain for news consumption; however, it is referred to as a double-edged sword. While it is user-friendly and low-cost, social media is the reason why fake news can spread rapidly, which is detrimental to society, businesses, and many consumers. Therefore, fake news detection is an emerging field. However, some challenges have restricted other researchers from developing a universal machine learning model that is fast, efficient, and reliable to stop the proliferation because of the lack of resources available, such as large-sized datasets. The goal of this culminating experience project is to explore how varying datasets …


Distance Correlation Based Feature Selection In Random Forest, Jose Munoz-Lopez May 2023

Distance Correlation Based Feature Selection In Random Forest, Jose Munoz-Lopez

Electronic Theses, Projects, and Dissertations

The Pearson correlation coefficient is a commonly used measure of correlation, but it has limitations as it only measures the linear relationship between two numerical variables. In 2007, Szekely et al. introduced the distance correlation, which measures all types of dependencies between random vectors X and Y in arbitrary dimensions, not just the linear ones. In this thesis, we propose a filter method that utilizes distance correlation as a criterion for feature selection in Random Forest regression. We conduct extensive simulation studies to evaluate its performance compared to existing methods under various data settings, in terms of the prediction mean …


Analysis On Suicidal Ideation Among Adolescents (12-17 Years) In The Usa, Himani Raturi Jul 2020

Analysis On Suicidal Ideation Among Adolescents (12-17 Years) In The Usa, Himani Raturi

Electronic Theses, Projects, and Dissertations

Suicide is one of the leading health concerns in United States among adolescents and the presence of suicidal ideation (SI) is quite high, with ~20-30% of adolescents reporting it at some point. Though we have seen growth and development in the prevention of suicide, there is limited research on the ability to identify the adolescents which might be at risk for SI. The objective behind the project is to identify adolescents with SI using machine learning.

The project shows statistics from different articles on adolescents in the U.S. For this study, adolescent data was taken from NSDUH 2018. Moreover, detailed …