Open Access. Powered by Scholars. Published by Universities.®

Other Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

211 Full-Text Articles 439 Authors 99,237 Downloads 56 Institutions

All Articles in Other Statistics and Probability

Faceted Search

211 full-text articles. Page 1 of 8.

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels 2018 Southern Methodist University

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...


Secondary Data Analysis Project, Jonathan M. Gallimore 2018 Embry-Riddle Aeronautical University

Secondary Data Analysis Project, Jonathan M. Gallimore

SF 420 PR - Gallimore - Fall 2018

This activity is designed to give students an opportunity to apply what they have learned in statistics to a real dataset.

This activity will help students apply what they have learned in statistics to real world data and answer their own research questions. Students will also practice reporting their results in a paper using APA format.


Quantitative Jeopardy Feud, Jonathan M. Gallimore 2018 Embry-Riddle Aeronautical University

Quantitative Jeopardy Feud, Jonathan M. Gallimore

MSF 600 PR - Gallimore - Fall 2018

This activity - Quantitative Jeopardy Feud - is a method for using a game as a final exam.


On N/P-Asymptotic Distribution Of Vector Of Weighted Traces Of Powers Of Wishart Matrices, Jolanta Maria Pielaszkiewicz, Dietrich von Rosen, Martin Singull 2018 Linnaeus University, Växjö, Sweden

On N/P-Asymptotic Distribution Of Vector Of Weighted Traces Of Powers Of Wishart Matrices, Jolanta Maria Pielaszkiewicz, Dietrich Von Rosen, Martin Singull

Electronic Journal of Linear Algebra

The joint distribution of standardized traces of $\frac{1}{n}XX'$ and of $\Big(\frac{1}{n}XX'\Big)^2$, where the matrix $X:p\times n$ follows a matrix normal distribution is proved asymptotically to be multivariate normal under condition $\frac{{n}}{p}\overset{n,p\rightarrow\infty}{\rightarrow}c>0$. Proof relies on calculations of asymptotic moments and cumulants obtained using a recursive formula derived in Pielaszkiewicz et al. (2015). The covariance matrix of the underlying vector is explicitely given as a function of $n$ and $p$.


Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum 2018 Southern Methodist University

Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum

SMU Data Science Review

In this paper, we attempt to improve upon the classic formulation of save percentage in the NHL by controlling the context of the shots and use alternative measures than save percentage. In particular, we find save percentage to be both a weakly repeatable skill and predictor of future performance, and we seek other goalie performance calculations that are more robust. To do so, we use three primary tests to test intra-season consistency, intra-season predictability, and inter-season consistency, and extend the analysis to disentangle team effects on goalie statistics. We find that there are multiple ways to improve upon classic save ...


Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen 2018 Stephen F Austin State University

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

Electronic Theses and Dissertations

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a ...


Standard And Anomalous Wave Transport Inside Random Media, Xujun Ma 2018 The Graduate Center, City University of New York

Standard And Anomalous Wave Transport Inside Random Media, Xujun Ma

All Dissertations, Theses, and Capstone Projects

This thesis is a study of wave transport inside random media using random matrix theory. Anderson localization plays a central role in wave transport in random media. As a consequence of destructive interference in multiple scattering, the wave function decays exponentially inside random systems. Anderson localization is a wave effect that applies to both classical waves and quantum waves. Random matrix theory has been successfully applied to study the statistical properties of transport and localization of waves. Particularly, the solution of the Dorokhov-Mello-Pereyra-Kumar (DMPK) equation gives the distribution of transmission.

For wave transport in standard one dimensional random systems in ...


Initial Evidence Of Construct Validity Of Data From A Self-Assessment Instrument Of Technological Pedagogical Content Knowledge (Tpack) In 2-Year Public College Faculty In Texas, Kristin C. Scott 2018 University of Texas at Tyler

Initial Evidence Of Construct Validity Of Data From A Self-Assessment Instrument Of Technological Pedagogical Content Knowledge (Tpack) In 2-Year Public College Faculty In Texas, Kristin C. Scott

Human Resource Development Theses and Dissertations

Technological pedagogical content knowledge (TPACK) has been studied in K-12 faculty in the U.S. and around the world using survey methodology. Very few studies of TPACK in post-secondary faculty have been conducted and no peer-reviewed studies in U.S. post-secondary faculty have been published to date. The present study is the first reliability and validity of data from a TPACK survey to be conducted with a large sample of U.S. post-secondary faculty. The professorate of 2-year public college faculty in Texas will help their institutions meet the goals of the state’s higher education strategic plan, 60x30TX. In ...


Waste Management By Waste: Removal Of Acid Dyes From Wastewaters Of Textile Coloration Using Fish Scales, S M Fijul Kabir 2018 Louisiana State University

Waste Management By Waste: Removal Of Acid Dyes From Wastewaters Of Textile Coloration Using Fish Scales, S M Fijul Kabir

LSU Master's Theses

Removal of hazardous acid dyes by economical process using low-cost bio-sorbents from wool industry wastewaters is of a pressing need, since it causes skin and respiratory diseases and disrupts other environmental components. Fish scales (FS), a by-product of fish industry, a type of solid waste, are usually discarded carelessly resulting in pungent odor and environmental burden. In this research, the FS of black drum (Pogonias cromis) were used for the removal of acid dyes (acid red 1 (AR1), acid blue 45 (AB45) and acid yellow 127 (AY126)) from wool industry wastewaters by absorption process with a view to valorizing fish ...


Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam 2018 The University of Western Ontario

Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam

Electronic Thesis and Dissertation Repository

This thesis advocates the use of shrinkage and penalty techniques for estimating the parameters of a regression model that comprises both parametric and nonparametric components and develops semi-nonparametric density estimation methodologies that are applicable in a regression context.

First, a moment-based approach whereby a univariate or bivariate density function is approximated by means of a suitable initial density function that is adjusted by a linear combination of orthogonal polynomials is introduced. Such adjustments are shown to be mathematically equivalent to making use of standard polynomials in one or two variables. Once extended to apply to density estimation, in which case ...


Building A Better Risk Prevention Model, Steven Hornyak 2018 Houston County Schools

Building A Better Risk Prevention Model, Steven Hornyak

National Youth-At-Risk Conference Savannah

This presentation chronicles the work of Houston County Schools in developing a risk prevention model built on more than ten years of longitudinal student data. In its second year of implementation, Houston At-Risk Profiles (HARP), has proven effective in identifying those students most in need of support and linking them to interventions and supports that lead to improved outcomes and significantly reduces the risk of failure.


Predicting The Next Us President By Simulating The Electoral College, Boyan Kostadinov 2018 CUNY New York City College of Technology

Predicting The Next Us President By Simulating The Electoral College, Boyan Kostadinov

Publications and Research

We develop a simulation model for predicting the outcome of the US Presidential election based on simulating the distribution of the Electoral College. The simulation model has two parts: (a) estimating the probabilities for a given candidate to win each state and DC, based on state polls, and (b) estimating the probability that a given candidate will win at least 270 electoral votes, and thus win the White House. All simulations are coded using the high-level, open-source programming language R. One of the goals of this paper is to promote computational thinking in any STEM field by illustrating how probabilistic ...


Old English Character Recognition Using Neural Networks, Sattajit Sutradhar 2018 Georgia Southern University

Old English Character Recognition Using Neural Networks, Sattajit Sutradhar

Electronic Theses & Dissertations

Character recognition has been capturing the interest of researchers since the beginning of the twentieth century. While the Optical Character Recognition for printed material is very robust and widespread nowadays, the recognition of handwritten materials lags behind. In our digital era more and more historical, handwritten documents are digitized and made available to the general public. However, these digital copies of handwritten materials lack the automatic content recognition feature of their printed materials counterparts. We are proposing a practical, accurate, and computationally efficient method for Old English character recognition from manuscript images. Our method relies on a modern machine learning ...


Queues With Server Utilization Of One, Robert Aidoo 2018 University of Windsor

Queues With Server Utilization Of One, Robert Aidoo

Major Papers

In most queueing systems of type GI/G/1, the stability condition requires that the server utilization be strictly less than 1. The standard exception is a D/D/1 system in which stability still holds for server utilization equal to 1. This paper presents other cases when server utilization can equal 1, and discusses their characteristics.


Data Analysis With Small Samples And Non-Normal Data: Nonparametrics And Other Strategies, Carl Siebert, Darcy C. Siebert 2017 Boise State University

Data Analysis With Small Samples And Non-Normal Data: Nonparametrics And Other Strategies, Carl Siebert, Darcy C. Siebert

Carl Siebert

No abstract provided.


Making Models With Bayes, Pilar Olid 2017 California State University, San Bernardino

Making Models With Bayes, Pilar Olid

Electronic Theses, Projects, and Dissertations

Bayesian statistics is an important approach to modern statistical analyses. It allows us to use our prior knowledge of the unknown parameters to construct a model for our data set. The foundation of Bayesian analysis is Bayes' Rule, which in its proportional form indicates that the posterior is proportional to the prior times the likelihood. We will demonstrate how we can apply Bayesian statistical techniques to fit a linear regression model and a hierarchical linear regression model to a data set. We will show how to apply different distributions to Bayesian analyses and how the use of a prior affects ...


Open Source Artificial Intelligence In A Biological/Ecological Context, Trevor Grant 2017 Illinois State University

Open Source Artificial Intelligence In A Biological/Ecological Context, Trevor Grant

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Discrete Stochastic Modeling For First-Year Biology Students, Dmitry Kondrashov 2017 University of Chicago

Discrete Stochastic Modeling For First-Year Biology Students, Dmitry Kondrashov

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Investigating The Student Enrollment Decision At Wku, Alec Brown 2017 Western Kentucky University

Investigating The Student Enrollment Decision At Wku, Alec Brown

Honors College Capstone Experience/Thesis Projects

The purpose of this research is to investigate the relationships between the enrollment decision of first-time, first-year students admitted to Western Kentucky University and the amount of financial aid awarded, as well as demographic information. The Division of Enrollment Management provided a SAS dataset containing various information about all WKU students admitted in 2013, 2014, and 2015. Additionally, information about the 2016 class of admitted students was provided. The data has been analyzed in SAS Enterprise Miner. We performed analysis using decision tree modeling and logistic regression modeling. Results of these two procedures indicated the importance of credit hours earned ...


Imputation For Random Forests, Joshua Young 2017 Utah State University

Imputation For Random Forests, Joshua Young

All Graduate Plan B and other Reports

This project introduces two new methods for imputation of missing data in random forests. The new methods are compared against other frequently used imputation methods, including those used in the randomForest package in R. To test the effectiveness of these methods, missing data are imputed into datasets that contain two missing data mechanisms including missing at random and missing completely at random. After imputation, random forests are run on the data and accuracies for the predictions are obtained. Speed is an important aspect in computing; the speeds for all the tested methods are also compared.

One of the new methods ...


Digital Commons powered by bepress