Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Applied Statistics (21)
- Statistical Methodology (14)
- Computer Sciences (13)
- Data Science (10)
- Engineering (10)
-
- Social and Behavioral Sciences (9)
- Business (7)
- Categorical Data Analysis (7)
- Multivariate Analysis (7)
- Artificial Intelligence and Robotics (6)
- Medicine and Health Sciences (6)
- Numerical Analysis and Scientific Computing (6)
- Theory and Algorithms (6)
- Biostatistics (5)
- Mathematics (5)
- Other Statistics and Probability (5)
- Probability (5)
- Analysis (4)
- Computer Engineering (4)
- Statistical Theory (4)
- Business Analytics (3)
- Medical Sciences (3)
- Survival Analysis (3)
- Technology and Innovation (3)
- Aerospace Engineering (2)
- Applied Mathematics (2)
- Business Intelligence (2)
- Keyword
-
- Statistics (13)
- Machine Learning (5)
- Deep Learning (4)
- Machine learning (4)
- Regression (4)
-
- CNN (3)
- Data Science (3)
- LSTM (3)
- Biostatistics (2)
- Classification (2)
- Football (2)
- Logistic regression (2)
- NFL (2)
- NLP (2)
- Neural Network (2)
- Prediction (2)
- Sports (2)
- Time Series (2)
- Transfer Learning (2)
- Yelp (2)
- 1-D (1)
- AI (1)
- ARIMA (1)
- Age imputation (1)
- Aggregate (1)
- Aggregation (1)
- Aircraft (1)
- Algorithm (1)
- Amazon LEX (1)
- Ambient ionization (1)
- Publication Type
Articles 31 - 45 of 45
Full-Text Articles in Statistical Models
Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt
Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt
SMU Data Science Review
Public surveys are predominantly used when forecasting election outcomes. While the approach has had significant successes, the surveys have had their failures as well, especially when it comes to accuracy and reliability. As a result, it becomes challenging for political parties to spend their campaign budgets in a manner that facilitates the growth of a favorable and verifiable public opinion. Consequently, it is critical that a more accurate methodology to predict election outcome is developed. In this paper, we present an evaluation of the impact of utilizing dynamic public data on predicting the outcome of elections. Our model yielded a …
Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas
Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas
SMU Data Science Review
In this paper, we present a method to identify urban areas with a higher likelihood of pedestrian safety related events. Pedestrian safety related events are pedestrian-vehicle interactions that result in fatalities, injuries, accidents without injury, or near--misses between pedestrians and vehicles. To develop a solution to this problem of identifying likely event locations, we assemble data, primarily from the City of Cincinnati and Hamilton County, that include safety reports from a five year period, geographic information for these events, citizen survey of pedestrian reported concerns, non-emergency requests for service for any cause in the city, property values and public transportation …
Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater
Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater
SMU Data Science Review
The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …
Estimation And Variable Selection In High-Dimensional Settings With Mismeasured Observations, Michael Byrd
Estimation And Variable Selection In High-Dimensional Settings With Mismeasured Observations, Michael Byrd
Statistical Science Theses and Dissertations
Understanding high-dimensional data has become essential for practitioners across many disciplines. The general increase in ability to collect large amounts of data has prompted statistical methods to adapt for the rising number of possible relationships to be uncovered. The key to this adaptation has been the notion of sparse models, or, rather, models where most relationships between variables are assumed to be negligible at best. Driving these sparse models have been constraints on the solution set, yielding regularization penalties imposed on the optimization procedure. While these penalties have found great success, they are typically formulated with strong assumptions on the …
Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane
Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane
Statistical Science Theses and Dissertations
If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?
We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …
Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl
Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl
SMU Data Science Review
In this paper, we present a heart disease prediction use case showing how synthetic data can be used to address privacy concerns and overcome constraints inherent in small medical research data sets. While advanced machine learning algorithms, such as neural networks models, can be implemented to improve prediction accuracy, these require very large data sets which are often not available in medical or clinical research. We examine the use of surrogate data sets comprised of synthetic observations for modeling heart disease prediction. We generate surrogate data, based on the characteristics of original observations, and compare prediction accuracy results achieved from …
Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John
Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John
SMU Data Science Review
In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age, …
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels
SMU Data Science Review
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …
Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra
Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra
SMU Data Science Review
In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to …
Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman
Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman
SMU Data Science Review
In this paper, we present an analysis of flight data in order to determine whether the application of the Edge Aerodynamix Conformal Vortex Generator (CVG), applied to the wings of aircraft, reduces fuel flow during cruising conditions of flight. The CVG is a special treatment and film applied to the wings of an aircraft to protect the wings and reduce the non-laminar flow of air around the wings during flight. It is thought that by reducing the non-laminar flow or vortices around and directly behind the wings that an aircraft will move more smoothly through the air and provide a …
Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin
Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin
SMU Data Science Review
In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from [7]. We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization [22]. This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted …
Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin
Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin
SMU Data Science Review
In this paper, we present a model for predicting the game day outcomes of National Football League games. 3 of the most popular sources for game day predictions are analyzed for comparison. Player data and outcomes from previous games are used, but we also incorporate several weather factors into our models. Over 1,700 games were incorporated and 3 separate models are created using simple regression, principal component analysis, and a recursive model. We also discuss the ethicality of using data science techniques by individuals with the knowledge in order to gain an advantage over a population lacking this specialized training.
Association Tests For Genetic Effect And Its Interaction With Environmental Factors, Zhengyang Zhou
Association Tests For Genetic Effect And Its Interaction With Environmental Factors, Zhengyang Zhou
Statistical Science Theses and Dissertations
My research is in the area of statistical genetics, and it contains three projects: (1) Differentiating the Cochran-Armitage (CA) trend test and Pearson’s chi-square test: location and dispersion; (2) Decomposing Pearson’s chi-square test: a linear regression and its departure from linearity; (3) Testing nonlinear gene-environment (GxE) interaction through varying coefficient and linear mixed models.
(1) In genetic case-control association studies, a standard practice is to perform the CA trend test with 1 degree-of-freedom (df) under the assumption of an additive model. However, when the true genetic model is recessive or near recessive, it is outperformed by Pearson’s chi-square test with …
Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas
Cognitive Virtual Admissions Counselor, Kumar Raja Guvindan Raju, Cory Adams, Raghuram Srinivas
SMU Data Science Review
Abstract. In this paper, we present a cognitive virtual admissions counselor for the Master of Science in Data Science program at Southern Methodist University. The virtual admissions counselor is a system capable of providing potential students accurate information at the time that they want to know it. After the evaluation of multiple technologies, Amazon’s LEX was selected to serve as the core technology for the virtual counselor chatbot. Student surveys were leveraged to collect and generate training data to deploy the natural language capability. The cognitive virtual admissions counselor platform is currently capable of providing an end-to-end conversational dialog to …
Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia
Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia
Statistical Science Theses and Dissertations
This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.
Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level …