Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Statistical Models (4)
- Statistical Methodology (3)
- Computer Sciences (2)
- Other Computer Sciences (2)
- Other Statistics and Probability (2)
-
- Social and Behavioral Sciences (2)
- Analysis (1)
- Applied Mathematics (1)
- Artificial Intelligence and Robotics (1)
- Biostatistics (1)
- Business (1)
- Business Analytics (1)
- Business Intelligence (1)
- Business and Corporate Communications (1)
- Categorical Data Analysis (1)
- Computer Law (1)
- Engineering (1)
- Engineering Education (1)
- Law (1)
- Legal Studies (1)
- Mathematics (1)
- Multivariate Analysis (1)
- Numerical Analysis and Computation (1)
- Other Legal Studies (1)
- Probability (1)
- Programming Languages and Compilers (1)
- Science and Technology Studies (1)
- Keyword
-
- Machine learning (2)
- Algorithm (1)
- Amazon LEX (1)
- Applied statistics (1)
- Bag of Words (1)
-
- Basketball (1)
- Biometric (1)
- Chatbot (1)
- Comparative analysis (1)
- Comparison study (1)
- Data science (1)
- Filtering (1)
- Hockey (1)
- Intent identification (1)
- Logistic regression (1)
- Machine learning models (1)
- NLP (1)
- Naive Bayes (1)
- Python (1)
- R (1)
- Random forest (1)
- Regression (1)
- Review (1)
- SAS (1)
- Sentiment (1)
- Slot variable detection (1)
- Sports (1)
- Statistics (1)
- Tool (1)
- Yelp (1)
- Publication Type
Articles 1 - 8 of 8
Full-Text Articles in Applied Statistics
Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler
Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler
SMU Data Science Review
Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool capable …
Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi
Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi
SMU Data Science Review
In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
SMU Data Science Review
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …
Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra
Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra
SMU Data Science Review
In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to …
Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum
Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum
SMU Data Science Review
In this paper, we attempt to improve upon the classic formulation of save percentage in the NHL by controlling the context of the shots and use alternative measures than save percentage. In particular, we find save percentage to be both a weakly repeatable skill and predictor of future performance, and we seek other goalie performance calculations that are more robust. To do so, we use three primary tests to test intra-season consistency, intra-season predictability, and inter-season consistency, and extend the analysis to disentangle team effects on goalie statistics. We find that there are multiple ways to improve upon classic save …
Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis
Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis
SMU Data Science Review
A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.
Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas
Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas
SMU Data Science Review
Abstract. In this paper, we present a cognitive virtual admissions counselor for the Master of Science in Data Science program at Southern Methodist University. The virtual admissions counselor is a system capable of providing potential students accurate information at the time that they want to know it. After the evaluation of multiple technologies, Amazon’s LEX was selected to serve as the core technology for the virtual counselor chatbot. Student surveys were leveraged to collect and generate training data to deploy the natural language capability. The cognitive virtual admissions counselor platform is currently capable of providing an end-to-end conversational dialog to …
Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia
Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia
Statistical Science Theses and Dissertations
This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.
Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level …