Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 27 of 27

Full-Text Articles in Physical Sciences and Mathematics

Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler Dec 2023

Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler

SMU Data Science Review

Addiction and substance abuse disorder is a significant problem in the United States. Over the past two decades, the United States has faced a boom in substance abuse, which has resulted in an increase in death and disruption of families across the nation. The State of Ohio has been particularly hard hit by the crisis, with overdose rates nearly doubling the national average. Established in the mid 1970’s Sober Living Housing is an alcohol and substance use recovery model emphasizing personal responsibility, sober living, and community support. This model has been adopted by the Ohio Recovery Housing organization, which seeks …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn Mar 2023

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

SMU Data Science Review

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …


Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan Sep 2022

Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan

SMU Data Science Review

Millions of people live with diabetes worldwide [7]. To mitigate some of the many symptoms associated with diabetes, an estimated 350,000 people in the United States rely on insulin pumps [17]. For many of these people, how effectively their insulin pump performs is the difference between sleeping through the night and a life threatening emergency treatment at a hospital. Three programmed insulin pump therapy settings governing effective insulin pump function are: Basal Rate (BR), Insulin Sensitivity Factor (ISF), and Carbohydrate Ratio (ICR). For many people using insulin pumps, these therapy settings are often not correct, given their physiological needs. While …


Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler Sep 2022

Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler

SMU Data Science Review

Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three …


Adjusting Community Survey Data Benchmarks For External Factors, Allen Miller, Nicole M. Norelli, Robert Slater, Mingyang N. Yu Jun 2022

Adjusting Community Survey Data Benchmarks For External Factors, Allen Miller, Nicole M. Norelli, Robert Slater, Mingyang N. Yu

SMU Data Science Review

Abstract. Using U.S. resident survey data from the National Community Survey in combination with public data from the U.S. Census and additional sources, a Voting Regressor Model was developed to establish fair benchmark values for city performance. These benchmarks were adjusted for characteristics the city cannot easily influence that contribute to confidence in local government, such as population size, demographics, and income. This adjustment allows for a more meaningful comparison and interpretation of survey results among individual cities. Methods explored for the benchmark adjustment included cluster analysis, anomaly detection, and a variety of regression techniques, including random forest, ridge, decision …


Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia Dec 2021

Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia

SMU Data Science Review

Across the United States, record numbers of wildfires are observed costing billions of dollars in property damage, polluting the environment, and putting lives at risk. The ability of emergency management professionals, city planners, and private entities such as insurance companies to determine if an area is at higher risk of a fire breaking out has never been greater. This paper proposes a novel methodology for identifying and characterizing zones with increased risks of forest fires. Methods involving machine learning techniques use the widely available and recorded data, thus making it possible to implement the tool quickly.


Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels Jan 2021

Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels

SMU Data Science Review

Understanding diagnostic tests and examining important features of novel coronavirus (COVID-19) infection are essential steps for controlling the current pandemic of 2020. In this paper, we study the relationship between clinical diagnosis and analytical features of patient blood panels from the US, Mexico, and Brazil. Our analysis confirms that among adults, the risk of severe illness from COVID-19 increases with pre-existing conditions such as diabetes and immunosuppression. Although more than eight months into pandemic, more data have become available to indicate that more young adults were getting infected. In addition, we expand on the definition of COVID-19 test and discuss …


Time Series Analysis Of Offshore Buoy Light Detection And Ranging (Lidar) Windspeed Data, Aditya Garapati, Charles J. Henderson, Carl Walenciak, Brian T. Waite Sep 2020

Time Series Analysis Of Offshore Buoy Light Detection And Ranging (Lidar) Windspeed Data, Aditya Garapati, Charles J. Henderson, Carl Walenciak, Brian T. Waite

SMU Data Science Review

In this paper, modeling techniques for the forecasting of wind speed using historical values observed by Light Detection and Ranging (LIDAR) sensors in an offshore context are described. Both univariate time series and multivariate time series modeling techniques leveraging meteorological data collected simultaneously with the LIDAR data are evaluated for potential contributions to predictive ability. Accurate and timely ability to predict wind values is essential to the effective integration of wind power into existing power grid systems. It allows for both the management of rapid ramp-up / down of base production capacity due to highly variable wind power inputs and …


Universal Vector Neural Machine Translation With Effective Attention, Joshua Yi, Satish Mylapore, Ryan Paul, Robert Slater Apr 2020

Universal Vector Neural Machine Translation With Effective Attention, Joshua Yi, Satish Mylapore, Ryan Paul, Robert Slater

SMU Data Science Review

Neural Machine Translation (NMT) leverages one or more trained neural networks for the translation of phrases. Sutskever intro- duced a sequence to sequence based encoder decoder model which be- came the standard for NMT based systems. Attention mechanisms were later introduced to address the issues with the translation of long sen- tences and improving overall accuracy. In this paper, we propose two improvements to the encoder decoder based NMT approach. Most trans- lation models are trained as one model for one translation. We introduce a neutral/universal model representation that can be used to predict more than one language depending on …


Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace Jan 2020

Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace

SMU Data Science Review

In this paper, we present a quantitative approach to model the manufacturer’s suggested retail price (MSRP) for children’s doll- houses and establish relationships among key features that contribute most to establishing MSRP. Determination of the MSRP is a critical step in how consumers respond with their wallets when purchasing an item. KidKraft, a global leader in toys and juvenile products, sets MSRP subjectively using product experts. The process is arduous and time consuming requiring the focus of specialized resources and knowledge of the interaction between key attributes and their impact on consumer value. An accurate prediction of MSRP during the …


Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas Dec 2019

Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas

SMU Data Science Review

In the age of hyper-connectivity, 24/7 news cycles, and instant news alerts via social media, mental health researchers don't have a way to automatically detect news content which is associated with triggering anxiety or depression in mental health patients. Using the Associated Press news wire, a semantic network was built with 1,056 news articles containing over 500,000 connections across multiple topics to provide a personalized algorithm which detects problematic news content for a given reader. We make use of Semantic Network Analysis to surface the relationship between news article text and anxiety in readers who struggle with mental health disorders. …


Predicting Wind Turbine Blade Erosion Using Machine Learning, Casey Martinez, Festus Asare Yeboah, Scott Herford, Matt Brzezinski, Viswanath Puttagunta Aug 2019

Predicting Wind Turbine Blade Erosion Using Machine Learning, Casey Martinez, Festus Asare Yeboah, Scott Herford, Matt Brzezinski, Viswanath Puttagunta

SMU Data Science Review

Using time-series data and turbine blade inspection assessments, we present a classification model in order to predict remaining turbine blade life in wind turbines. Capturing the kinetic energy of wind requires complex mechanical systems, which require sophisticated maintenance and planning strategies. There are many traditional approaches to monitoring the internal gearbox and generator, but the condition of turbine blades can be difficult to measure and access. Accurate and cost- effective estimates of turbine blade life cycles will drive optimal investments in repairs and improve overall performance. These measures will drive down costs as well as provide cheap and clean electricity …


Asl Reverse Dictionary - Asl Translation Using Deep Learning, Ann Nelson, Kj Price, Rosalie Multari May 2019

Asl Reverse Dictionary - Asl Translation Using Deep Learning, Ann Nelson, Kj Price, Rosalie Multari

SMU Data Science Review

The challenges of learning a new language can be reduced with real-time feedback on pronunciation and language usage. Today there are readily available technologies which provide such feedback on spoken languages, by translating the voice of the learner into written text. For someone seeking to learn American Sign Language (ASL), there is however no such feedback application available. A learner of American Sign Language might reference websites or books to obtain an image of a hand sign for a word. This process is like looking up a word in a dictionary, and if the person wanted to know if they …


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi May 2019

Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi

SMU Data Science Review

Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. …


Tidying And Analysis Of The 2014 Texas English Ii End-Of-Course Exam, David Churchman, Abigail Morton Garland May 2019

Tidying And Analysis Of The 2014 Texas English Ii End-Of-Course Exam, David Churchman, Abigail Morton Garland

SMU Data Science Review

The state of Texas requires all public high school students to take End of Course (EOC) exams. The results of these exams are made nominally public, but in a shape and format that precludes ready analysis. To the extent possible, principles of tidy data will be applied to clean and analyze the publicly released data file for the 2014 English II EOC exam, providing insights into the EOC program and a case for better public data from the Texas Education Administration (TEA).


Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas Jan 2019

Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas

SMU Data Science Review

In this paper, we present a method to identify urban areas with a higher likelihood of pedestrian safety related events. Pedestrian safety related events are pedestrian-vehicle interactions that result in fatalities, injuries, accidents without injury, or near--misses between pedestrians and vehicles. To develop a solution to this problem of identifying likely event locations, we assemble data, primarily from the City of Cincinnati and Hamilton County, that include safety reports from a five year period, geographic information for these events, citizen survey of pedestrian reported concerns, non-emergency requests for service for any cause in the city, property values and public transportation …


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater Jan 2019

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …


Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler Aug 2018

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

SMU Data Science Review

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool capable …


Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi Aug 2018

Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi

SMU Data Science Review

In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra Aug 2018

Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra

SMU Data Science Review

In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to …


Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum Jul 2018

Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum

SMU Data Science Review

In this paper, we attempt to improve upon the classic formulation of save percentage in the NHL by controlling the context of the shots and use alternative measures than save percentage. In particular, we find save percentage to be both a weakly repeatable skill and predictor of future performance, and we seek other goalie performance calculations that are more robust. To do so, we use three primary tests to test intra-season consistency, intra-season predictability, and inter-season consistency, and extend the analysis to disentangle team effects on goalie statistics. We find that there are multiple ways to improve upon classic save …


Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis Jul 2018

Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis

SMU Data Science Review

A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.


Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas Apr 2018

Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas

SMU Data Science Review

Abstract. In this paper, we present a cognitive virtual admissions counselor for the Master of Science in Data Science program at Southern Methodist University. The virtual admissions counselor is a system capable of providing potential students accurate information at the time that they want to know it. After the evaluation of multiple technologies, Amazon’s LEX was selected to serve as the core technology for the virtual counselor chatbot. Student surveys were leveraged to collect and generate training data to deploy the natural language capability. The cognitive virtual admissions counselor platform is currently capable of providing an end-to-end conversational dialog to …