Yelp’S Review Filtering Algorithm, 2018 Southern Methodist University

#### Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

*SMU Data Science Review*

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as *recommended* or *non-recommended* affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...

Applications Of Game Theory, Tableau, Analytics, And R To Fashion Design, 2018 Atlanta University Center

#### Applications Of Game Theory, Tableau, Analytics, And R To Fashion Design, Aisha Asiri

*Electronic Theses & Dissertations Collection for Atlanta University & Clark Atlanta University*

This thesis presents various models to the fashion industry to predict the profits for some products. To determine the expected performance of each product in 2016, we used tools of game theory to help us identify the expected value. We went further and performed a simple linear regression and used scatter plots to help us predict further the performance of the products of Prada. We used tools of game theory, analytics, and statistics to help us predict the performance of some of Prada's products. We also used the Tableau platform to visualize an overview of the products' performances. All ...

Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, 2018 East Tennessee State University

#### Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister

*Electronic Theses and Dissertations*

A probability distribution is a statistical function that describes the probability of possible outcomes in an experiment or occurrence. There are many different probability distributions that give the probability of an event happening, given some sample size *n*. An important question in statistics is to determine the distribution of the sum of independent random variables when the sample size *n* is fixed. For example, it is known that the sum of *n* independent Bernoulli random variables with success probability *p* is a Binomial distribution with parameters *n* and *p*: However, this is not true when the sample size is not ...

The Expected Number Of Patterns In A Random Generated Permutation On [N] = {1,2,...,N}, 2018 East Tennessee State University

#### The Expected Number Of Patterns In A Random Generated Permutation On [N] = {1,2,...,N}, Evelyn Fokuoh

*Electronic Theses and Dissertations*

Previous work by Flaxman (2004) and Biers-Ariel et al. (2018) focused on the number of distinct words embedded in a string of words of length n. In this thesis, we will extend this work to permutations, focusing on the maximum number of distinct permutations contained in a permutation on [n] = {1,2,...,n} and on the expected number of distinct permutations contained in a random permutation on [n]. We further considered the problem where repetition of subsequences are as a result of the occurrence of (Type A and/or Type B) replications. Our method of enumerating the Type A replications ...

Excess Versions Of The Minkowski And Hölder Inequalities, 2018 Michigan Technological University

#### Excess Versions Of The Minkowski And Hölder Inequalities, Iosif Pinelis

*Iosif Pinelis*

No abstract provided.

Pretrial Release And Failure-To-Appear In Mclean County, Il, 2018 Illinois State University

#### Pretrial Release And Failure-To-Appear In Mclean County, Il, Jonathan Monsma

*Stevenson Center for Community and Economic Development to Stevenson Center for Community and Economic Development—Student Research*

Actuarial risk assessment tools increasingly have been employed in jurisdictions across the U.S. to assist courts in the decision of whether someone charged with a crime should be detained or released prior to their trial. These tools should be continually monitored and researched by independent 3^{rd} parties to ensure that these powerful tools are being administered properly and used in the most proficient way as to provide socially optimal results. McLean County, Illinois began using the Public Safety Assessment-Court^{TM }(PSA-Court or simply PSA) risk assessment tool beginning in 2016. This study culls data from the McLean County ...

The Statistical Exploration In The $G$-Expectation Framework: The Pseudo Simulation And Estimation Of Variance Uncertainty, 2018 The University of Western Ontario

#### The Statistical Exploration In The $G$-Expectation Framework: The Pseudo Simulation And Estimation Of Variance Uncertainty, Yifan Li

*Electronic Thesis and Dissertation Repository*

The $G$-expectation framework, motivated by problems with \emph{uncertainty}, is a new generalization of the classical probability framework. Similar to the Choquet expectation, the $G$-expectation can be represented as the supremum of a class of linear expectations. In the past two decades, it has developed into a complete stochastic structure connected with a large family of nonlinear PDEs. Nonetheless, to apply it to real-world problems with uncertainty, it is fundamentally necessary to build up the associated statistical methodology.

This thesis explores the \emph{computation, simulation, and estimation} of the $G$-normal distribution (a typical distribution with variance uncertainty ...

On N/P-Asymptotic Distribution Of Vector Of Weighted Traces Of Powers Of Wishart Matrices, 2018 Linnaeus University, Växjö, Sweden

#### On N/P-Asymptotic Distribution Of Vector Of Weighted Traces Of Powers Of Wishart Matrices, Jolanta Maria Pielaszkiewicz, Dietrich Von Rosen, Martin Singull

*Electronic Journal of Linear Algebra*

The joint distribution of standardized traces of $\frac{1}{n}XX'$ and of $\Big(\frac{1}{n}XX'\Big)^2$, where the matrix $X:p\times n$ follows a matrix normal distribution is proved asymptotically to be multivariate normal under condition $\frac{{n}}{p}\overset{n,p\rightarrow\infty}{\rightarrow}c>0$. Proof relies on calculations of asymptotic moments and cumulants obtained using a recursive formula derived in Pielaszkiewicz et al. (2015). The covariance matrix of the underlying vector is explicitely given as a function of $n$ and $p$.

Deep Learning Analysis Of Limit Order Book, 2018 Washington University in St. Louis

#### Deep Learning Analysis Of Limit Order Book, Xin Xu

*Arts & Sciences Electronic Theses and Dissertations*

In this paper, we build a deep neural network for modeling spatial structure in limit order book and make prediction for future best ask or best bid price based on ideas of (Sirignano 2016). We propose an intuitive data processing method to approximate the data is non-available for us based only on level I data that is more widely available. The model is based on the idea that there is local dependence for best ask or best bid price and sizes of related orders. First we use logistic regression to prove that this approach is reasonable. To show the advantages ...

Golden Arm: A Probabilistic Study Of Dice Control In Craps, 2018 Monmouth University

#### Golden Arm: A Probabilistic Study Of Dice Control In Craps, Donald R. Smith, Robert Scott Iii

*UNLV Gaming Research & Review Journal*

This paper calculates how much control a craps shooter must possess on dice outcomes to eliminate the house advantage. A golden arm is someone who has dice control (or a rhythm roller or dice influencer). There are various strategies for dice control in craps. We discuss several possibilities of dice control that would result in several different mathematical models of control. We do not assert whether dice control is possible or not (there is a lack of published evidence). However, after studying casino-legal methods described by dice-control advocates, we can see only one realistic mathematical model that describes the resulting ...

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, 2018 Stephen F Austin State University

#### Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

*Electronic Theses and Dissertations*

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a ...

On Passing The Buck, 2018 Cedarville University

#### On Passing The Buck, Adam J. Hammett, Anna Joy Yang

*The Research and Scholarship Symposium*

Imagine there are n>1 people seated around a table, and person S starts with a fair coin they will flip to decide whom to hand the coin next -- if "heads" they pass right, and if "tails" they pass left. This process continues until all people at the table have "touched" the coin. Curiously, it turns out that all people seated at the table other than S have the same probability 1/(n-1) of being last to touch the coin. In fact, Lovasz and Winkler ("A note on the last new vertex visited by a random walk," J. Graph Theory ...

The Devil You Don’T Know: A Spatial Analysis Of Crime At Newark’S Prudential Center On Hockey Game Days, 2018 Institute for Security and Crime Science - University of Waikato

#### The Devil You Don’T Know: A Spatial Analysis Of Crime At Newark’S Prudential Center On Hockey Game Days, Justin Kurland, Eric Piza

*Journal of Sport Safety and Security*

Inspired by empirical research on spatial crime patterns in and around sports venues in the United Kingdom, this paper sought to measure the criminogenic extent of 216 hockey games that took place at the Prudential Center in Newark, NJ between 2007-2016. Do games generate patterns of crime in the areas beyond the arena, and if so, for what type of crime and how far? Police-recorded data for Newark are examined using a variety of exploratory methods and non-parametric permutation tests to visualize differences in crime patterns between game and non-game days across all of Newark and the downtown area. Change ...

Network Structure Sampling In Bayesian Networks Via Perfect Sampling From Linear Extensions, 2018 University of Colorado, Boulder

#### Network Structure Sampling In Bayesian Networks Via Perfect Sampling From Linear Extensions, Evan Sidrow

*Applied Mathematics Graduate Theses & Dissertations*

Bayesian networks are widely considered as powerful tools for modeling risk assessment, uncertainty, and decision making. They have been extensively employed to develop decision support systems in a variety of domains including medical diagnosis, risk assessment and management, human cognition, industrial process and procurement, pavement and bridge management, and system reliability. Bayesian networks are convenient graphical expressions for high dimensional probability distributions which are used to represent complex relationships between a large number of random variables. A Bayesian network is a directed acyclic graph consisting of nodes which represent random variables and arrows which correspond to probabilistic dependencies between them ...

Score Test And Likelihood Ratio Test For Zero-Inflated Binomial Distribution And Geometric Distribution, 2018 Western Kentucky University

#### Score Test And Likelihood Ratio Test For Zero-Inflated Binomial Distribution And Geometric Distribution, Xiaogang Dai

*Masters Theses & Specialist Projects*

The main purpose of this thesis is to compare the performance of the score test and the likelihood ratio test by computing type I errors and type II errors when the tests are applied to the geometric distribution and inflated binomial distribution. We first derive test statistics of the score test and the likelihood ratio test for both distributions. We then use the software package R to perform a simulation to study the behavior of the two tests. We derive the R codes to calculate the two types of error for each distribution. We create lots of samples to approximate ...

General Stochastic Integral And Itô Formula With Application To Stochastic Differential Equations And Mathematical Finance, 2018 Louisiana State University and Agricultural and Mechanical College

#### General Stochastic Integral And Itô Formula With Application To Stochastic Differential Equations And Mathematical Finance, Jiayu Zhai

*LSU Doctoral Dissertations*

A general stochastic integration theory for adapted and instantly independent stochastic processes arises when we consider anticipative stochastic differential equations. In Part I of this thesis, we conduct a deeper research on the general stochastic integral introduced by W. Ayed and H.-H. Kuo in 2008. We provide a rigorous mathematical framework for the integral in Chapter 2, and prove that the integral is well-defined. Then a general Itô formula is given. In Chapter 3, we present an intrinsic property, near-martingale property, of the general stochastic integral, and Doob-Meyer's decomposition for near-submartigales. We apply the new stochastic integration theory ...

Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, 2018 The University of Western Ontario

#### Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam

*Electronic Thesis and Dissertation Repository*

This thesis advocates the use of shrinkage and penalty techniques for estimating the parameters of a regression model that comprises both parametric and nonparametric components and develops semi-nonparametric density estimation methodologies that are applicable in a regression context.

First, a moment-based approach whereby a univariate or bivariate density function is approximated by means of a suitable initial density function that is adjusted by a linear combination of orthogonal polynomials is introduced. Such adjustments are shown to be mathematically equivalent to making use of standard polynomials in one or two variables. Once extended to apply to density estimation, in which case ...

Predicting The Next Us President By Simulating The Electoral College, 2018 New York City College of Technology, CUNY

#### Predicting The Next Us President By Simulating The Electoral College, Boyan Kostadinov

*Journal of Humanistic Mathematics*

We develop a simulation model for predicting the outcome of the US Presidential election based on simulating the distribution of the Electoral College. The simulation model has two parts: (a) estimating the probabilities for a given candidate to win each state and DC, based on state polls, and (b) estimating the probability that a given candidate will win at least 270 electoral votes, and thus win the White House. All simulations are coded using the high-level, open-source programming language R. One of the goals of this paper is to promote computational thinking in any STEM field by illustrating how probabilistic ...

Sampling Techniques For Big Data Analysis In Finite Population Inference, 2018 Iowa State University

#### Sampling Techniques For Big Data Analysis In Finite Population Inference, Jae Kwang Kim, Zhonglei Wang

*Statistics Preprints*

In analyzing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first method uses a version of inverse sampling by incorporating auxiliary infor- mation from external sources, and the second one borrows the idea of data integration by combining the big data sample with an independent proba- bility sample. Two simulation studies show that the proposed methods are unbiased and have better coverage rates than their alternatives. In addition, the proposed ...

Some Applications Of Sophisticated Mathematics To Randomized Computing, 2018 Selected Works

#### Some Applications Of Sophisticated Mathematics To Randomized Computing, Ronald I. Greenberg

*Ronald Greenberg*

No abstract provided.