Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

2,804 Full-Text Articles 3,720 Authors 765,903 Downloads 122 Institutions

All Articles in Applied Statistics

Faceted Search

2,804 full-text articles. Page 1 of 72.

Understanding Sexual Violence Against Women, Maria Martinez 2018 Illinois State University

Understanding Sexual Violence Against Women, Maria Martinez

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma 2018 University of Havana

Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma

Journal of Modern Applied Statistical Methods

The use of randomized response procedures allows diminishing the number of non-responses and increasing the accuracy of the responses. A new sampling strategy is developed where the reports are scrambled using the procedure of Gupta and Thornton. The estimator of the mean as well as the errors are developed for the Rao-Hartley-Cochran and Ranked Sets Sampling designs. The proposals are compared with the original model based on the use of simple random sampling.


Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling 2018 National Cancer Institute

Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling

Journal of Modern Applied Statistical Methods

Two multiple imputation methods, the Sequential Regression Multivariate Imputation Algorithm and the Cox-Lannacchione Weighted Sequential Hotdeck, were examined and compared to impute highly missing categorical variables from the Family Life, Activity, Sun, Health and Eating (FLASHE) study. This paper describes the imputation approaches and results from the study.


Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora 2018 Panjab University, Chandigarh, India

Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora

Journal of Modern Applied Statistical Methods

Bayesian and semi-Bayesian estimators of parameters of the generalized inverse Weibull distribution are obtained using Jeffreys’ prior and informative prior under specific assumptions of loss function. Using simulation, the relative efficiency of the proposed estimators is obtained under different set-ups. A real life example is also given.


Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler 2018 Southern Methodist University

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

SMU Data Science Review

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool ...


Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi 2018 Southern Methodist University

Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi

SMU Data Science Review

In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels 2018 Southern Methodist University

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...


Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra 2018 Southern Methodist University

Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra

SMU Data Science Review

In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to ...


Of Typicality And Predictive Distributions In Discriminant Function Analysis, Lyle W. Konigsberg, Susan R. Frankenberg 2018 Department of Anthropology, University of Illinois at Urbana–Champaign

Of Typicality And Predictive Distributions In Discriminant Function Analysis, Lyle W. Konigsberg, Susan R. Frankenberg

Human Biology Open Access Pre-Prints

While discriminant function analysis is an inherently Bayesian method, researchers attempting to estimate ancestry in human skeletal samples often follow discriminant function analysis with the calculation of frequentist-based typicalities for assigning group membership. Such an approach is problematic in that it fails to account for admixture and for variation in why individuals may be classified as outliers, or non-members of particular groups. This paper presents an argument and methodology for employing a fully Bayesian approach in discriminant function analysis applied to cases of ancestry estimation. The approach requires adding the calculation, or estimation, of predictive distributions as the final step ...


Applications Of Game Theory, Tableau, Analytics, And R To Fashion Design, Aisha Asiri 2018 Atlanta University Center

Applications Of Game Theory, Tableau, Analytics, And R To Fashion Design, Aisha Asiri

Electronic Theses & Dissertations Collection for Atlanta University & Clark Atlanta University

This thesis presents various models to the fashion industry to predict the profits for some products. To determine the expected performance of each product in 2016, we used tools of game theory to help us identify the expected value. We went further and performed a simple linear regression and used scatter plots to help us predict further the performance of the products of Prada. We used tools of game theory, analytics, and statistics to help us predict the performance of some of Prada's products. We also used the Tableau platform to visualize an overview of the products' performances. All ...


Creating A Better Technological Piano Practice Aid With Knowledge Tracing, Max Feldkamp 2018 University of Colorado, Boulder

Creating A Better Technological Piano Practice Aid With Knowledge Tracing, Max Feldkamp

Keyboard Graduate Theses & Dissertations

Modern music tutoring software and mobile instructional applications have great potential to help students practice at home effectively. They can offer extensive feedback on what the student is getting right and wrong and have adopted a gamified design with levels, badges, and other game-like elements to help gain wider appeal among students. Despite their advantages for motivating students and creating a safe practice environment, no current music instruction software demonstrates any knowledge about a student’s level of mastery. This can lead to awkward pedagogy and user frustration. Applying Bayesian Knowledge Tracing to tutoring systems provides an ideal way to ...


Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong 2018 East Tennessee State University

Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong

Electronic Theses and Dissertations

Sorting out data into partitions is increasing becoming complex as the constituents of data is growing outward everyday. Mixed data comprises continuous, categorical, directional functional and other types of variables. Clustering mixed data is based on special dissimilarities of the variables. Some data types may influence the clustering solution. Assigning appropriate weight to the functional data may improve the performance of the clustering algorithm. In this paper we use the extension of the Gower coefficient with judciously chosen weight for the L2 to cluster mixed data.The benefits of weighting are demonstrated both in in applications to the Buoy data ...


Comparison Of Correlation, Partial Correlation, And Conditional Mutual Information For Interaction Effects Screening In Generalized Linear Models, Ji Li 2018 University of Arkansas, Fayetteville

Comparison Of Correlation, Partial Correlation, And Conditional Mutual Information For Interaction Effects Screening In Generalized Linear Models, Ji Li

Theses and Dissertations

Numerous screening techniques have been developed in recent years for genome-wide association studies (GWASs) (Moore et al., 2010). In this thesis, a novel model-free screening method was developed and validated by an extensive simulation study. Many screening methods were mainly focused on main effects, while very few studies considered the models containing both main effects and interaction effects. In this work, the interaction effects were fully considered and three different methods (Pearson’s Correlation Coefficient, Partial Correlation, and Conditional Mutual Information) were tested and their prediction accuracies were compared.

Pearson’s Correlation Coefficient method, which is a direct interaction screening ...


Pretrial Release And Failure-To-Appear In Mclean County, Il, Jonathan Monsma 2018 Illinois State University

Pretrial Release And Failure-To-Appear In Mclean County, Il, Jonathan Monsma

Stevenson Center for Community and Economic Development to Stevenson Center for Community and Economic Development—Student Research

Actuarial risk assessment tools increasingly have been employed in jurisdictions across the U.S. to assist courts in the decision of whether someone charged with a crime should be detained or released prior to their trial. These tools should be continually monitored and researched by independent 3rd parties to ensure that these powerful tools are being administered properly and used in the most proficient way as to provide socially optimal results. McLean County, Illinois began using the Public Safety Assessment-CourtTM (PSA-Court or simply PSA) risk assessment tool beginning in 2016. This study culls data from the McLean County ...


A Distance Based Method For Solving Multi-Objective Optimization Problems, Murshid Kamal, Syed Aqib Jalil, Syed Mohd Muneeb, Irfan Ali 2018 Aligarh Muslim University

A Distance Based Method For Solving Multi-Objective Optimization Problems, Murshid Kamal, Syed Aqib Jalil, Syed Mohd Muneeb, Irfan Ali

Journal of Modern Applied Statistical Methods

A new model for the weighted method of goal programming is proposed based on minimizing the distances between ideal objectives to feasible objective space. It provides the best compromised solution for Multi Objective Linear Programming Problems (MOLPP). The proposed model tackles MOLPP by solving a series of single objective sub-problems, where the objectives are transformed into constraints. The compromise solution so obtained may be improved by defining priorities in terms of the weight. A criterion is also proposed for deciding the best compromise solution. Applications of the algorithm are discussed for transportation and assignment problems involving multiple and conflicting objectives ...


Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum 2018 Southern Methodist University

Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum

SMU Data Science Review

In this paper, we attempt to improve upon the classic formulation of save percentage in the NHL by controlling the context of the shots and use alternative measures than save percentage. In particular, we find save percentage to be both a weakly repeatable skill and predictor of future performance, and we seek other goalie performance calculations that are more robust. To do so, we use three primary tests to test intra-season consistency, intra-season predictability, and inter-season consistency, and extend the analysis to disentangle team effects on goalie statistics. We find that there are multiple ways to improve upon classic save ...


Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis 2018 Southern Methodist University

Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis

SMU Data Science Review

A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.


Estimation Of Finite Population Mean By Using Minimum And Maximum Values In Stratified Random Sampling, Umer Daraz, Javid Shabbir, Hina Khan 2018 Quaid-i-Azam University

Estimation Of Finite Population Mean By Using Minimum And Maximum Values In Stratified Random Sampling, Umer Daraz, Javid Shabbir, Hina Khan

Journal of Modern Applied Statistical Methods

In this paper we have suggested an improved class of ratio type estimators in estimating the finite population mean when information on minimum and maximum values of the auxiliary variable is known. The properties of the suggested class of estimators in terms of bias and mean square error are obtained up to first order of approximation. Two data sets are used for efficiency comparisons.


Computer Aided Clinical Trials For Implantable Cardiac Devices, Rahul Mangharam 2018 University of Pennsylvania

Computer Aided Clinical Trials For Implantable Cardiac Devices, Rahul Mangharam

Real-Time and Embedded Systems Lab (mLAB)

In this paper we aim to answer the question, ``How can modeling and simulation of physiological systems be used to evaluate life-critical implantable medical devices?'' Clinical trials for medical devices are becoming increasingly inefficient as they take several years to conduct, at very high cost and suffer from high rates of failure. For example, the Rhythm ID Goes Head-to-head Trial (RIGHT) sought to evaluate the performance of two arrhythmia discriminator algorithms for implantable cardioverter defibrillators, Vitality 2 vs. Medtronic, in terms of time-to-first inappropriate therapy, but concluded with results contrary to the initial hypothesis - after 5 years, 2,000+ patients ...


A Bayesian Beta-Mixture Model For Nonparametric Irt (Bbm-Irt), Ethan A. Arenson, George Karabatsos 2018 University of Illinois at Chicago

A Bayesian Beta-Mixture Model For Nonparametric Irt (Bbm-Irt), Ethan A. Arenson, George Karabatsos

Journal of Modern Applied Statistical Methods

Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. A simple and more flexible Bayesian nonparametric IRT model for dichotomous items is introduced, which constructs monotone item characteristic (step) curves by a finite mixture of beta distributions, which can support the entire space of monotone curves to any desired degree of accuracy. An adaptive random-walk Metropolis-Hastings algorithm is proposed to estimate the posterior distribution of the model parameters. The Bayesian IRT ...


Digital Commons powered by bepress