Open Access. Powered by Scholars. Published by Universities.^{®}
 Discipline

 Mathematics (80)
 Applied Mathematics (78)
 Applied Statistics (62)
 Statistical Models (57)
 Social and Behavioral Sciences (40)

 Engineering (36)
 Numerical Analysis and Computation (33)
 Computer Sciences (33)
 Statistical Methodology (24)
 Statistical Theory (23)
 Other Applied Mathematics (21)
 Analysis (18)
 Other Statistics and Probability (18)
 Law (17)
 Economics (16)
 Multivariate Analysis (15)
 Discrete Mathematics and Combinatorics (14)
 Biostatistics (14)
 Other Mathematics (14)
 Civil and Environmental Engineering (13)
 Electrical and Computer Engineering (13)
 Theory and Algorithms (13)
 Design of Experiments and Sample Surveys (12)
 Medicine and Health Sciences (12)
 Life Sciences (12)
 Business (11)
 Categorical Data Analysis (9)
 Institution

 University of Pennsylvania (39)
 Selected Works (23)
 Wayne State University (20)
 Claremont Colleges (16)
 University of Tennessee, Knoxville (14)

 Iowa State University (13)
 Western University (11)
 Southern Illinois University Carbondale (9)
 University of Colorado, Boulder (8)
 Technological University Dublin (7)
 University of Nevada, Las Vegas (7)
 Stephen F. Austin State University (7)
 University of Nebraska  Lincoln (7)
 City University of New York (CUNY) (6)
 Portland State University (6)
 California Polytechnic State University, San Luis Obispo (5)
 University of Pennsylvania Law School (5)
 Washington University in St. Louis (5)
 Old Dominion University (5)
 RoseHulman Institute of Technology (5)
 SelectedWorks (5)
 University of Massachusetts Amherst (4)
 Western Kentucky University (4)
 University of North Florida (4)
 East Tennessee State University (4)
 California State University, San Bernardino (3)
 Bard College (3)
 Bryant University (3)
 Cedarville University (3)
 Gettysburg College (2)
 Keyword

 Probability (32)
 Mathematics (9)
 Random walk (8)
 Statistics (7)
 Stochastic Process (6)

 Dynamic programming (6)
 Archaeology (6)
 Graph Theory (5)
 Probabilities (5)
 GRASP (5)
 American Southeast (5)
 Loglinear models (5)
 Caddo (5)
 Discrepancy (5)
 Date Combination (4)
 System analysis (4)
 Radiocarbon (4)
 Stochastic Processes (4)
 Social Networks (4)
 Summed Probability Distributions (4)
 Cybernetics (4)
 System design (4)
 Diffusion process (3)
 Ergodic control (3)
 Game Theory (3)
 Algorithm (3)
 Consensus (3)
 Logic (3)
 Bayesian analysis (3)
 Evidence (3)
 Publication Year
 Publication

 Statistics Papers (26)
 Mathematics Faculty Research Publications (19)
 Doctoral Dissertations (14)
 Electronic Thesis and Dissertation Repository (11)
 Articles and Preprints (9)

 All HMC Faculty Publications and Research (7)
 Electronic Theses and Dissertations (7)
 CRHR: Archaeology (6)
 Operations, Information and Decisions Papers (6)
 Statistics Publications (6)
 Journal of Humanistic Mathematics (5)
 Masters Theses (5)
 Faculty Scholarship at Penn Law (5)
 Departmental Papers (ESE) (5)
 Publications and Research (4)
 Martin Zwick (4)
 Systems Science Faculty Publications and Presentations (4)
 Barry A. Palynchuk PhD (4)
 Masters Theses & Specialist Projects (4)
 HMC Senior Theses (4)
 Articles (4)
 UNF Graduate Theses and Dissertations (4)
 Undergraduate Honors Theses (4)
 Mathematical Sciences Technical Reports (MSTR) (4)
 Applied Mathematics Graduate Theses & Dissertations (4)
 Electronic Theses, Projects, and Dissertations (3)
 Olga A. Vsevolozhskaya (3)
 Honors Projects in Mathematics (3)
 Byron E. Bell (3)
 Bioelectrics Publications (3)
 Publication Type
 File Type
Articles 1  30 of 315
FullText Articles in Probability
Characterizing The Permanence And Stationary Distribution For A Family Of Malaria Stochastic Models, Divine Wanduku
Characterizing The Permanence And Stationary Distribution For A Family Of Malaria Stochastic Models, Divine Wanduku
Biology and Medicine Through Mathematics Conference
No abstract provided.
Characterizing The Tails Of Degree Distributions In RealWorld Networks, Anna Broido
Characterizing The Tails Of Degree Distributions In RealWorld Networks, Anna Broido
Applied Mathematics Graduate Theses & Dissertations
This is a thesis about how to characterize the statistical structure of the tails of degree distributions of realworld networks. The primary contribution is a statistical test of the prevalence of scalefree structure in realworld networks. A central claim in modern network science is that realworld networks are typically "scale free," meaning that the fraction of nodes with degree k follows a power law, decaying like k^{a}, often with 2 < a< 3. However, empirical evidence for this belief derives from a relatively small number of realworld networks. In the first section, we test the universality of scalefree structure by applying stateoftheart statistical tools to a large corpus of nearly 1000 network data sets drawn from social, biological, technological, and informational sources. We fit the powerlaw model to each degree distribution, test its statistical plausibility, and compare it via a likelihood ratio test to alternative, nonscalefree models, e.g., the lognormal. Across domains, we find that scalefree networks are rare, with only 4% exhibiting the strongestpossible evidence of scalefree structure and 52% exhibiting the weakestpossible evidence. Furthermore, evidence of scalefree structure is not uniformly distributed across sources: social networks are at best weakly scale free, while a handful of technological and biological networks can be called strongly scale free. These results undermine the universality of scalefree networks and reveal that realworld networks exhibit a rich structural diversity that will likely require new ideas and mechanisms to explain. A core methodological component of addressing the ubiquity of scalefree structure in realworld networks is an ability to fit a power law to the degree distribution. In the second section, we numerically evaluate and compare, using both synthetic data with known structure and realworld data with unknown structure, two statistically principled methods for estimating the tail parameters for powerlaw distributions, showing that in practice, a method based on extreme value theory and a sophisticated bootstrap and the more commonly used method based an empirical minimization approach exhibit similar accuracy.
Optimal Conditional Expectation At The Video Poker Game Jacks Or Better, Stewart N. Ethier, John J. Kim, Jiyeon Lee
Optimal Conditional Expectation At The Video Poker Game Jacks Or Better, Stewart N. Ethier, John J. Kim, Jiyeon Lee
UNLV Gaming Research & Review Journal
There are 134,459 distinct initial hands at the video poker game Jacks or Better, taking suit exchangeability into account. A computer program can determine the optimal strategy (i.e., which cards to hold) for each such hand, but a complete list of these strategies would require a booklength manuscript. Instead, a handrank table, which fits on a single page and reproduces the optimal strategy perfectly, was found for Jacks or Better as early as the mid 1990s. Is there a systematic way to derive such a handrank table? We show that there is indeed, and it involves finding the ...
Surprise Vs. Probability As A Metric For Proof, Edward K. Cheng, Matthew Ginther
Surprise Vs. Probability As A Metric For Proof, Edward K. Cheng, Matthew Ginther
Edward Cheng
In this Symposium issue celebrating his career, Professor Michael Risinger in Leveraging Surprise proposes using "the fundamental emotion of surprise" as a way of measuring belief for purposes of legal proof. More specifically, Professor Risinger argues that we should not conceive of the burden of proof in terms of probabilities such as 51%, 95%, or even "beyond a reasonable doubt." Rather, the legal system should reference the threshold using "words of estimative surprise" asking jurors how surprised they would be if the fact in question were not true. Toward this goal (and being averse to cardinality), he suggests categories such ...
OneDimensional Excited Random Walk With Unboundedly Many Excitations Per Site, Omar Chakhtoun
OneDimensional Excited Random Walk With Unboundedly Many Excitations Per Site, Omar Chakhtoun
All Dissertations, Theses, and Capstone Projects
We study a discrete time excited random walk on the integers lattice requiring a tail decay estimate on the number of excitations per site and extend the existing framework, methods, and results to a wider class of excited random walks.
We give criteria for recurrence versus transience, ballisticity versus zero linear speed, completely classify limit laws in the transient regime, and establish a functional limit laws in the recurrence regime.
Infinite Sums, Products, And Urn Models, Yiyan Ni
Infinite Sums, Products, And Urn Models, Yiyan Ni
Major Papers
This paper considers an urn and its evolution in discrete time steps. The
urn initially has two different colored balls(blue and red). We discuss different
cases where k blue balls (k = 1, 2, 3, ... ) will be added (or removed) at every
step if a blue ball is withdrawn, based on the goal of eventually withdrawing a
red ball P(R eventually). We compute the probability of eventually withdrawing
a red ball with two different methods–one using infinite sums and other using
infinite products. One advantage of this is that we can obtain P(R eventually) in
a complex ...
Analysis Of Ranked Gene Tree Probability Distributions Under The Coalescent Process For Detecting Anomaly Zones, Anastasiia Kim
Analysis Of Ranked Gene Tree Probability Distributions Under The Coalescent Process For Detecting Anomaly Zones, Anastasiia Kim
Shared Knowledge Conference
In phylogenetic studies, gene trees are used to reconstruct species tree. Under the multispecies coalescent model, gene trees topologies may differ from that of species trees. The incorrect gene tree topology (one that does not match the species tree) that is more probable than the correct one is termed anomalous gene tree (AGT). Species trees that can generate such AGTs are said to be in the anomaly zone (AZ). In this region, the method of choosing the most common gene tree as the estimate of the species tree will be inconsistent and will converge to an incorrect species tree when ...
Statistical Investigation Of Road And Railway Hazardous Materials Transportation Safety, Amirfarrokh Iranitalab
Statistical Investigation Of Road And Railway Hazardous Materials Transportation Safety, Amirfarrokh Iranitalab
Civil Engineering Theses, Dissertations, and Student Research
Transportation of hazardous materials (hazmat) in the United States (U.S.) constituted 22.8% of the total tonnage transported in 2012 with an estimated value of more than 2.3 billion dollars. As such, hazmat transportation is a significant economic activity in the U.S. However, hazmat transportation exposes people and environment to the infrequent but potentially severe consequences of incidents resulting in hazmat release. Trucks and trains carried 63.7% of the hazmat in the U.S. in 2012 and are the major foci of this dissertation. The main research objectives were 1) identification and quantification of the effects ...
Probabilities Involving Standard Trirectangular Tetrahedral Dice Rolls, Rulon Olmstead, Doneliezer Baize
Probabilities Involving Standard Trirectangular Tetrahedral Dice Rolls, Rulon Olmstead, Doneliezer Baize
RoseHulman Undergraduate Mathematics Journal
The goal is to be able to calculate probabilities involving irregular shaped dice rolls. Here it is attempted to model the probabilities of rolling standard trirectangular tetrahedral dice on a hard surface, such as a table top. The vertices and edges of a tetrahedron were projected onto the surface of a sphere centered at the center of mass of the tetrahedron. By calculating the surface areas bounded by the resultant geodesics, baseline probabilities were achieved. Using a 3D printer, dice were constructed of uniform density and the results of rolling them were recorded. After calculating the corresponding confidence intervals, the ...
SeasonAhead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri
SeasonAhead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri
Publications and Research
Water risk management is a ubiquitous challenge faced by stakeholders in the water or agricultural sector. We present a methodological framework for forecasting water storage requirements and present an application of this methodology to risk assessment in India. The application focused on forecasting crop water stress for potatoes grown during the monsoon season in the Satara district of Maharashtra. Preseason largescale climate predictors used to forecast water stress were selected based on an exhaustive search method that evaluates for highest ranked probability skill score and lowest rootmeansquared error in a leaveoneout crossvalidation mode. Adaptive forecasts were made in the years ...
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack RasmusVorrath, Mooyoung Lee, Daniel W. Engels
Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack RasmusVorrath, Mooyoung Lee, Daniel W. Engels
SMU Data Science Review
In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or nonrecommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or nonrecommended. The model classified review recommendations with an accuracy of 78%. We found that ...
Applications Of Game Theory, Tableau, Analytics, And R To Fashion Design, Aisha Asiri
Applications Of Game Theory, Tableau, Analytics, And R To Fashion Design, Aisha Asiri
Electronic Theses & Dissertations Collection for Atlanta University & Clark Atlanta University
This thesis presents various models to the fashion industry to predict the profits for some products. To determine the expected performance of each product in 2016, we used tools of game theory to help us identify the expected value. We went further and performed a simple linear regression and used scatter plots to help us predict further the performance of the products of Prada. We used tools of game theory, analytics, and statistics to help us predict the performance of some of Prada's products. We also used the Tableau platform to visualize an overview of the products' performances. All ...
The Expected Number Of Patterns In A Random Generated Permutation On [N] = {1,2,...,N}, Evelyn Fokuoh
The Expected Number Of Patterns In A Random Generated Permutation On [N] = {1,2,...,N}, Evelyn Fokuoh
Electronic Theses and Dissertations
Previous work by Flaxman (2004) and BiersAriel et al. (2018) focused on the number of distinct words embedded in a string of words of length n. In this thesis, we will extend this work to permutations, focusing on the maximum number of distinct permutations contained in a permutation on [n] = {1,2,...,n} and on the expected number of distinct permutations contained in a random permutation on [n]. We further considered the problem where repetition of subsequences are as a result of the occurrence of (Type A and/or Type B) replications. Our method of enumerating the Type A replications ...
Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister
Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister
Electronic Theses and Dissertations
A probability distribution is a statistical function that describes the probability of possible outcomes in an experiment or occurrence. There are many different probability distributions that give the probability of an event happening, given some sample size n. An important question in statistics is to determine the distribution of the sum of independent random variables when the sample size n is fixed. For example, it is known that the sum of n independent Bernoulli random variables with success probability p is a Binomial distribution with parameters n and p: However, this is not true when the sample size is not ...
Excess Versions Of The Minkowski And Hölder Inequalities, Iosif Pinelis
Excess Versions Of The Minkowski And Hölder Inequalities, Iosif Pinelis
Iosif Pinelis
No abstract provided.
Pretrial Release And FailureToAppear In Mclean County, Il, Jonathan Monsma
Pretrial Release And FailureToAppear In Mclean County, Il, Jonathan Monsma
Stevenson Center for Community and Economic Development to Stevenson Center for Community and Economic Development—Student Research
Actuarial risk assessment tools increasingly have been employed in jurisdictions across the U.S. to assist courts in the decision of whether someone charged with a crime should be detained or released prior to their trial. These tools should be continually monitored and researched by independent 3^{rd} parties to ensure that these powerful tools are being administered properly and used in the most proficient way as to provide socially optimal results. McLean County, Illinois began using the Public Safety AssessmentCourt^{TM }(PSACourt or simply PSA) risk assessment tool beginning in 2016. This study culls data from the McLean County ...
The Statistical Exploration In The $G$Expectation Framework: The Pseudo Simulation And Estimation Of Variance Uncertainty, Yifan Li
Electronic Thesis and Dissertation Repository
The $G$expectation framework, motivated by problems with \emph{uncertainty}, is a new generalization of the classical probability framework. Similar to the Choquet expectation, the $G$expectation can be represented as the supremum of a class of linear expectations. In the past two decades, it has developed into a complete stochastic structure connected with a large family of nonlinear PDEs. Nonetheless, to apply it to realworld problems with uncertainty, it is fundamentally necessary to build up the associated statistical methodology.
This thesis explores the \emph{computation, simulation, and estimation} of the $G$normal distribution (a typical distribution with variance uncertainty ...
On N/PAsymptotic Distribution Of Vector Of Weighted Traces Of Powers Of Wishart Matrices, Jolanta Maria Pielaszkiewicz, Dietrich Von Rosen, Martin Singull
On N/PAsymptotic Distribution Of Vector Of Weighted Traces Of Powers Of Wishart Matrices, Jolanta Maria Pielaszkiewicz, Dietrich Von Rosen, Martin Singull
Electronic Journal of Linear Algebra
The joint distribution of standardized traces of $\frac{1}{n}XX'$ and of $\Big(\frac{1}{n}XX'\Big)^2$, where the matrix $X:p\times n$ follows a matrix normal distribution is proved asymptotically to be multivariate normal under condition $\frac{{n}}{p}\overset{n,p\rightarrow\infty}{\rightarrow}c>0$. Proof relies on calculations of asymptotic moments and cumulants obtained using a recursive formula derived in Pielaszkiewicz et al. (2015). The covariance matrix of the underlying vector is explicitely given as a function of $n$ and $p$.
Mixed Logical And Probabilistic Reasoning In The Game Of Clue, Todd W. Neller, Ziqian Luo
Mixed Logical And Probabilistic Reasoning In The Game Of Clue, Todd W. Neller, Ziqian Luo
Computer Science Faculty Publications
Neller and Ziqian Luo ’18 presented a means of mixed logical and probabilistic reasoning with knowledge in the popular deductive mystery game Clue. Using atleast constraints, we more efficiently represented and reasoned about cardinality constraints on Clue card deal knowledge, and then employed a WalkSATbased solution sampling algorithm with a tabu search metaheuristic in order to estimate the probabilities of unknown card places.
Deep Learning Analysis Of Limit Order Book, Xin Xu
Deep Learning Analysis Of Limit Order Book, Xin Xu
Arts & Sciences Electronic Theses and Dissertations
In this paper, we build a deep neural network for modeling spatial structure in limit order book and make prediction for future best ask or best bid price based on ideas of (Sirignano 2016). We propose an intuitive data processing method to approximate the data is nonavailable for us based only on level I data that is more widely available. The model is based on the idea that there is local dependence for best ask or best bid price and sizes of related orders. First we use logistic regression to prove that this approach is reasonable. To show the advantages ...
Golden Arm: A Probabilistic Study Of Dice Control In Craps, Donald R. Smith, Robert Scott Iii
Golden Arm: A Probabilistic Study Of Dice Control In Craps, Donald R. Smith, Robert Scott Iii
UNLV Gaming Research & Review Journal
This paper calculates how much control a craps shooter must possess on dice outcomes to eliminate the house advantage. A golden arm is someone who has dice control (or a rhythm roller or dice influencer). There are various strategies for dice control in craps. We discuss several possibilities of dice control that would result in several different mathematical models of control. We do not assert whether dice control is possible or not (there is a lack of published evidence). However, after studying casinolegal methods described by dicecontrol advocates, we can see only one realistic mathematical model that describes the resulting ...
Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen
Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen
Electronic Theses and Dissertations
The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a ...
On Passing The Buck, Adam J. Hammett, Anna Joy Yang
On Passing The Buck, Adam J. Hammett, Anna Joy Yang
The Research and Scholarship Symposium
Imagine there are n>1 people seated around a table, and person S starts with a fair coin they will flip to decide whom to hand the coin next  if "heads" they pass right, and if "tails" they pass left. This process continues until all people at the table have "touched" the coin. Curiously, it turns out that all people seated at the table other than S have the same probability 1/(n1) of being last to touch the coin. In fact, Lovasz and Winkler ("A note on the last new vertex visited by a random walk," J. Graph Theory ...
The Devil You Don’T Know: A Spatial Analysis Of Crime At Newark’S Prudential Center On Hockey Game Days, Justin Kurland, Eric Piza
The Devil You Don’T Know: A Spatial Analysis Of Crime At Newark’S Prudential Center On Hockey Game Days, Justin Kurland, Eric Piza
Journal of Sport Safety and Security
Inspired by empirical research on spatial crime patterns in and around sports venues in the United Kingdom, this paper sought to measure the criminogenic extent of 216 hockey games that took place at the Prudential Center in Newark, NJ between 20072016. Do games generate patterns of crime in the areas beyond the arena, and if so, for what type of crime and how far? Policerecorded data for Newark are examined using a variety of exploratory methods and nonparametric permutation tests to visualize differences in crime patterns between game and nongame days across all of Newark and the downtown area. Change ...
Network Structure Sampling In Bayesian Networks Via Perfect Sampling From Linear Extensions, Evan Sidrow
Network Structure Sampling In Bayesian Networks Via Perfect Sampling From Linear Extensions, Evan Sidrow
Applied Mathematics Graduate Theses & Dissertations
Bayesian networks are widely considered as powerful tools for modeling risk assessment, uncertainty, and decision making. They have been extensively employed to develop decision support systems in a variety of domains including medical diagnosis, risk assessment and management, human cognition, industrial process and procurement, pavement and bridge management, and system reliability. Bayesian networks are convenient graphical expressions for high dimensional probability distributions which are used to represent complex relationships between a large number of random variables. A Bayesian network is a directed acyclic graph consisting of nodes which represent random variables and arrows which correspond to probabilistic dependencies between them ...
Score Test And Likelihood Ratio Test For ZeroInflated Binomial Distribution And Geometric Distribution, Xiaogang Dai
Score Test And Likelihood Ratio Test For ZeroInflated Binomial Distribution And Geometric Distribution, Xiaogang Dai
Masters Theses & Specialist Projects
The main purpose of this thesis is to compare the performance of the score test and the likelihood ratio test by computing type I errors and type II errors when the tests are applied to the geometric distribution and inflated binomial distribution. We first derive test statistics of the score test and the likelihood ratio test for both distributions. We then use the software package R to perform a simulation to study the behavior of the two tests. We derive the R codes to calculate the two types of error for each distribution. We create lots of samples to approximate ...
General Stochastic Integral And Itô Formula With Application To Stochastic Differential Equations And Mathematical Finance, Jiayu Zhai
LSU Doctoral Dissertations
A general stochastic integration theory for adapted and instantly independent stochastic processes arises when we consider anticipative stochastic differential equations. In Part I of this thesis, we conduct a deeper research on the general stochastic integral introduced by W. Ayed and H.H. Kuo in 2008. We provide a rigorous mathematical framework for the integral in Chapter 2, and prove that the integral is welldefined. Then a general Itô formula is given. In Chapter 3, we present an intrinsic property, nearmartingale property, of the general stochastic integral, and DoobMeyer's decomposition for nearsubmartigales. We apply the new stochastic integration theory ...
Advances In SemiNonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam
Advances In SemiNonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam
Electronic Thesis and Dissertation Repository
This thesis advocates the use of shrinkage and penalty techniques for estimating the parameters of a regression model that comprises both parametric and nonparametric components and develops seminonparametric density estimation methodologies that are applicable in a regression context.
First, a momentbased approach whereby a univariate or bivariate density function is approximated by means of a suitable initial density function that is adjusted by a linear combination of orthogonal polynomials is introduced. Such adjustments are shown to be mathematically equivalent to making use of standard polynomials in one or two variables. Once extended to apply to density estimation, in which case ...
Predicting The Next Us President By Simulating The Electoral College, Boyan Kostadinov
Predicting The Next Us President By Simulating The Electoral College, Boyan Kostadinov
Journal of Humanistic Mathematics
We develop a simulation model for predicting the outcome of the US Presidential election based on simulating the distribution of the Electoral College. The simulation model has two parts: (a) estimating the probabilities for a given candidate to win each state and DC, based on state polls, and (b) estimating the probability that a given candidate will win at least 270 electoral votes, and thus win the White House. All simulations are coded using the highlevel, opensource programming language R. One of the goals of this paper is to promote computational thinking in any STEM field by illustrating how probabilistic ...
Sampling Techniques For Big Data Analysis In Finite Population Inference, Jae Kwang Kim, Zhonglei Wang
Sampling Techniques For Big Data Analysis In Finite Population Inference, Jae Kwang Kim, Zhonglei Wang
Statistics Preprints
In analyzing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first method uses a version of inverse sampling by incorporating auxiliary infor mation from external sources, and the second one borrows the idea of data integration by combining the big data sample with an independent proba bility sample. Two simulation studies show that the proposed methods are unbiased and have better coverage rates than their alternatives. In addition, the proposed ...