Open Access. Powered by Scholars. Published by Universities.®
Social and Behavioral Sciences Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Machine learning (3)
- Sports (3)
- Clustering (2)
- Data Science (2)
- Deep learning (2)
-
- NLP (2)
- Natural language processing (2)
- Abuse (1)
- Accessibility (1)
- Algorithm (1)
- Applied statistics (1)
- Aspect-based sentiment analysis (1)
- BERT (1)
- BERTopic Text mining (1)
- Bag of Words (1)
- Beach (1)
- Beach Volleyball (1)
- Bears (1)
- Benchmark Adjustment (1)
- Bias (1)
- Budget (1)
- Chasm (1)
- Classification (1)
- Cluster Analysis (1)
- College Soccer (1)
- Community Survey Data (1)
- Competition (1)
- Competitor (1)
- Confidence in Government (1)
- Congestion (1)
Articles 1 - 19 of 19
Full-Text Articles in Social and Behavioral Sciences
Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler
Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler
SMU Data Science Review
Addiction and substance abuse disorder is a significant problem in the United States. Over the past two decades, the United States has faced a boom in substance abuse, which has resulted in an increase in death and disruption of families across the nation. The State of Ohio has been particularly hard hit by the crisis, with overdose rates nearly doubling the national average. Established in the mid 1970’s Sober Living Housing is an alcohol and substance use recovery model emphasizing personal responsibility, sober living, and community support. This model has been adopted by the Ohio Recovery Housing organization, which seeks …
Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy
Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy
SMU Data Science Review
American Football is a billion-dollar industry in the United States. The analytical aspect of the sport is an ever-growing domain, with open-source competitions like the NFL Big Data Bowl accelerating this growth. With the amount of player movement during each play, tracking data can prove valuable in many areas of football analytics. While concussion detection, catch recognition, and completion percentage prediction are all existing use cases for this data, player-specific movement attributes, such as speed and agility, may be helpful in predicting play success. This research calculates player-specific speed and agility attributes from tracking data and supplements them with descriptive …
Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater
Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater
SMU Data Science Review
A chasm exists between the active public equity investment management industry's fundamental, momentum, and quantitative styles. In this study, the researchers explore ways to bridge this gap by leveraging domain knowledge, fundamental analysis, momentum, crowdsourcing, and data science methods. This research also seeks to test the developed tools and strategies during the volatile time period of 2020 and 2021.
Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia
Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia
SMU Data Science Review
Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.
Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard
Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard
SMU Data Science Review
The Ukrainian-Russian war has garnered significant attention worldwide, with fake news obstructing the formation of public opinion and disseminating false information. This scholarly paper explores the use of unsupervised learning methods and the Bidirectional Encoder Representations from Transformers (BERT) to detect fake news in news articles from various sources. BERT topic modeling is applied to cluster news articles by their respective topics, followed by summarization to measure the similarity scores. The hypothesis posits that topics with larger variances are more likely to contain fake news. The proposed method was evaluated using a dataset of approximately 1000 labeled news articles related …
Examining Bias In Jury Selection For Criminal Trials In Dallas County, Megan Ball, Brandon Birmingham, Matt Farrow, Katherine Mitchell, Bivin Sadler, Lynne Stokes
Examining Bias In Jury Selection For Criminal Trials In Dallas County, Megan Ball, Brandon Birmingham, Matt Farrow, Katherine Mitchell, Bivin Sadler, Lynne Stokes
SMU Data Science Review
One of the hallmarks of the American judicial system is the concept of trial by jury, and for said trial to consist of an impartial jury of your peers. Several landmark legal cases in the history of the United States have challenged this notion of equal representation by jury—most notably Batson v. Kentucky, 476 U.S. 79 (1986). Most of the previous research, focus, and legal precedence has centered around peremptory challenges and attempting to prove if bias was suspected in excluding certain jurors from serving. Few studies, however, focus on examining challenges for cause based on self-reported biases from the …
Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler
Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler
SMU Data Science Review
Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three …
Adjusting Community Survey Data Benchmarks For External Factors, Allen Miller, Nicole M. Norelli, Robert Slater, Mingyang N. Yu
Adjusting Community Survey Data Benchmarks For External Factors, Allen Miller, Nicole M. Norelli, Robert Slater, Mingyang N. Yu
SMU Data Science Review
Abstract. Using U.S. resident survey data from the National Community Survey in combination with public data from the U.S. Census and additional sources, a Voting Regressor Model was developed to establish fair benchmark values for city performance. These benchmarks were adjusted for characteristics the city cannot easily influence that contribute to confidence in local government, such as population size, demographics, and income. This adjustment allows for a more meaningful comparison and interpretation of survey results among individual cities. Methods explored for the benchmark adjustment included cluster analysis, anomaly detection, and a variety of regression techniques, including random forest, ridge, decision …
Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun
Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun
SMU Data Science Review
This study investigates a comparison of classification models used to determine aspect based separated text sentiment and predict binary sentiments of movie reviews with genre and aspect specific driving factors. To gain a broader classification analysis, five machine and deep learning algorithms were compared: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), and Recurrent Neural Network Long-Short-Term Memory (RNN LSTM). The various movie aspects that are utilized to separate the sentences are determined through aggregating aspect words from lexicon-base, supervised and unsupervised learning. The driving factors are randomly assigned to various movie aspects and their impact tied to …
Urban Traffic Simulation: Network And Demand Representation Impacts On Congestion Metrics, Aaron Faltesek, Balasubramaniam Dakshinamoorthi, Sreeni Prabhala, Akbar Thobani, Anu Kuncheria, Jane Macfarlane
Urban Traffic Simulation: Network And Demand Representation Impacts On Congestion Metrics, Aaron Faltesek, Balasubramaniam Dakshinamoorthi, Sreeni Prabhala, Akbar Thobani, Anu Kuncheria, Jane Macfarlane
SMU Data Science Review
Traffic simulations are often used by city planners as a basis for predicting the impact of policies, plans, and operations. The complexities underpinning traffic simulations are often not described in detail yet can significantly impact the simulation outcome. Conflating underlying data for simulations is complex and hinders the interest in this type of exploration. This paper aims to elucidate critical features of traffic simulations that drive the generated metrics of the modeled urban environment. Specifically, this paper examines differences in two road graph networks for the metropolitan region of Houston, TX: a reduced network composed of 45,675 road links and …
Using Machine Learning Methods To Predict The Movement Trajectories Of The Louisiana Black Bear, Daniel Clark, David Shaw, Armando Vela, Shane Weinstock, John Santerre, Joseph D. Clark
Using Machine Learning Methods To Predict The Movement Trajectories Of The Louisiana Black Bear, Daniel Clark, David Shaw, Armando Vela, Shane Weinstock, John Santerre, Joseph D. Clark
SMU Data Science Review
In 1992, the Louisiana black bear (Ursus americanus luteolus) was placed on the U.S. Endangered Species List. This was due to bear populations in Louisiana being small and isolated enough where their populations couldn’t intersect with other populations to grow. Interchange of individuals between subpopulations of bears in Louisiana is critical to maintain genetic diversity and avoid inbreeding effects. Utilizing GPS (Global Positioning System) data gathered from 31 radio-collared bears from 2010 through 2012, this research will investigate how bears traverse the landscape, which has implications for gene exchange. This paper will leverage machine learning tools to improve upon existing …
Analysis Of Individual Player Performances And Their Effect On Winning In College Soccer, Angelo Bravo, Thomas Karba, Sean Mcwhirter, Billy Nayden
Analysis Of Individual Player Performances And Their Effect On Winning In College Soccer, Angelo Bravo, Thomas Karba, Sean Mcwhirter, Billy Nayden
SMU Data Science Review
This study describes the process of modernizing the approach of the Southern Methodist University (SMU) Men's Soccer coaching staff through the use of location and tracking data from their matches in the 2019 season. This study utilizes a variety of modeling and analysis techniques to explore and categorize the data and use it to evaluate the types of plays that are most often correlated with victories. This study's contribution to college soccer analytics includes the implementation of a model to determine individual players' performance, the production of team-level metrics, and visualizations to increase the efficiency of the coaching staff's efforts. …
Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas
Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas
SMU Data Science Review
In the age of hyper-connectivity, 24/7 news cycles, and instant news alerts via social media, mental health researchers don't have a way to automatically detect news content which is associated with triggering anxiety or depression in mental health patients. Using the Associated Press news wire, a semantic network was built with 1,056 news articles containing over 500,000 connections across multiple topics to provide a personalized algorithm which detects problematic news content for a given reader. We make use of Semantic Network Analysis to surface the relationship between news article text and anxiety in readers who struggle with mental health disorders. …
A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater
A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater
SMU Data Science Review
In this paper, we present a common definition and list of skills for a Data Scientist using online job postings. The overlap and ambiguity of various roles such as data scientist, data engineer, data analyst, software engineer, database administrator, and statistician motivate the problem. To arrive at a single Data Scientist definition, we collect over 8,000 job postings from Indeed.com for the six job titles. Each corpus contains text on job qualifications, skills, responsibilities, educational preferences, and requirements. Our data science methodology and analysis rendered the single definition of a data scientist: A data scientist codes, collaborates, and communicates – …
Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, Ernesto Carrera-Ruvalcaba, Johnson Ekedum, Austin Hancock, Ben Brock
Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, Ernesto Carrera-Ruvalcaba, Johnson Ekedum, Austin Hancock, Ben Brock
SMU Data Science Review
Through microblogging applications, such as Twitter, people actively document their lives even in times of natural disasters such as hurricanes and earthquakes. While first responders and crisis-teams are able to help people who call 911, or arrive at a designated shelter, there are vast amounts of information being exchanged online via Twitter that provide real-time, location-based alerts that are going unnoticed. To effectively use this information, the Tweets must be verified for authenticity and categorized to ensure that the proper authorities can be alerted. In this paper, we create a Crisis Message Corpus from geotagged Tweets occurring during 7 hurricanes …
Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt
Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt
SMU Data Science Review
Public surveys are predominantly used when forecasting election outcomes. While the approach has had significant successes, the surveys have had their failures as well, especially when it comes to accuracy and reliability. As a result, it becomes challenging for political parties to spend their campaign budgets in a manner that facilitates the growth of a favorable and verifiable public opinion. Consequently, it is critical that a more accurate methodology to predict election outcome is developed. In this paper, we present an evaluation of the impact of utilizing dynamic public data on predicting the outcome of elections. Our model yielded a …
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
SMU Data Science Review
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …
Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum
Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum
SMU Data Science Review
In this paper, we attempt to improve upon the classic formulation of save percentage in the NHL by controlling the context of the shots and use alternative measures than save percentage. In particular, we find save percentage to be both a weakly repeatable skill and predictor of future performance, and we seek other goalie performance calculations that are more robust. To do so, we use three primary tests to test intra-season consistency, intra-season predictability, and inter-season consistency, and extend the analysis to disentangle team effects on goalie statistics. We find that there are multiple ways to improve upon classic save …
Comparative Study: Reducing Cost To Manage Accessibility With Existing Data, Claire Chu, Bill Kerneckel, Eric C. Larson, Nathan Mowat, Christopher Woodard
Comparative Study: Reducing Cost To Manage Accessibility With Existing Data, Claire Chu, Bill Kerneckel, Eric C. Larson, Nathan Mowat, Christopher Woodard
SMU Data Science Review
“Project Sidewalk” is an existing research effort that focuses on mapping accessibility issues for handicapped persons to efficiently plan wheelchair and mobile scooter friendly routes around Washington D.C. As supporters of this project, we utilized the data “Project Sidewalk” collected and used it to confirm predictions about where problem sidewalks exist based on real estate and crime data. We present a study that identifies correlations found between accessibility data and crime and housing statistics in the Washington D.C. metropolitan area. We identify the key reasons for increased accessibility and the issues with the current infrastructure management system. After a thorough …