Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Series

Simulation

Discipline
Institution
Publication Year
Publication

Articles 1 - 18 of 18

Full-Text Articles in Statistics and Probability

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


Interval Estimation Of Proportion Of Second-Level Variance In Multi-Level Modeling, Steven Svoboda Oct 2020

Interval Estimation Of Proportion Of Second-Level Variance In Multi-Level Modeling, Steven Svoboda

The Nebraska Educator: A Student-Led Journal

Physical, behavioral and psychological research questions often relate to hierarchical data systems. Examples of hierarchical data systems include repeated measures of students nested within classrooms, nested within schools and employees nested within supervisors, nested within organizations. Applied researchers studying hierarchical data structures should have an estimate of the intraclass correlation coefficient (ICC) for every nested level in their analyses because ignoring even relatively small amounts of interdependence is known to inflate Type I error rate in single-level models. Traditionally, researchers rely upon the ICC as a point estimate of the amount of interdependency in their data. Recent methods utilizing an …


Dot: Gene-Set Analysis By Combining Decorrelated Association Statistics, Olga A. Vsevolozhskaya, Min Shi, Fengjiao Hu, Dmitri V. Zaykin Apr 2020

Dot: Gene-Set Analysis By Combining Decorrelated Association Statistics, Olga A. Vsevolozhskaya, Min Shi, Fengjiao Hu, Dmitri V. Zaykin

Biostatistics Faculty Publications

Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic …


A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome Jun 2017

A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome

FIU Electronic Theses and Dissertations

Several methods have been proposed to estimate the kurtosis of a distribution. The three common estimators are: g2, G2 and b2. This thesis addressed the performance of these estimators by comparing them under the same simulation environments and conditions. The performance of these estimators are compared through confidence intervals by determining the average width and probabilities of capturing the kurtosis parameter of a distribution. We considered and compared classical and non-parametric methods in constructing these intervals. Classical method assumes normality to construct the confidence intervals while the non-parametric methods rely on bootstrap techniques. The bootstrap …


Technology Design: The Movement Of Means, Yu Gu Jan 2017

Technology Design: The Movement Of Means, Yu Gu

Open Educational Resources

In order to promote students’ conceptual understanding and learning experience in introductory statistics, a technology task, which focuses on the probability distribution in which means are defined, was created using TinkerPlots, an exploratory dataanalysis and modeling software. The targeted audiences range from senior high school grade levels to college freshmen who are starting their introductory course in statistics. Students will be guided to explore and discover the movement behaviors of means of a set of numbers randomly generated from a fixed range of values characterized by a predetermined probability distribution. The cognitive, mathematical, technological and pedagogical natures of the task, …


On Some Test Statistics For Testing The Population Skewness And Kurtosis: An Empirical Study, Yawen Guo Aug 2016

On Some Test Statistics For Testing The Population Skewness And Kurtosis: An Empirical Study, Yawen Guo

FIU Electronic Theses and Dissertations

The purpose of this thesis is to propose some test statistics for testing the skewness and kurtosis parameters of a distribution, not limited to a normal distribution. Since a theoretical comparison is not possible, a simulation study has been conducted to compare the performance of the test statistics. We have compared both parametric methods (classical method with normality assumption) and non-parametric methods (bootstrap in Bias Corrected Standard Method, Efron’s Percentile Method, Hall’s Percentile Method and Bias Corrected Percentile Method). Our simulation results for testing the skewness parameter indicate that the power of the tests differs significantly across sample sizes, the …


Simulating Longer Vectors Of Correlated Binary Random Variables Via Multinomial Sampling, Justine Shults Mar 2016

Simulating Longer Vectors Of Correlated Binary Random Variables Via Multinomial Sampling, Justine Shults

UPenn Biostatistics Working Papers

The ability to simulate correlated binary data is important for sample size calculation and comparison of methods for analysis of clustered and longitudinal data with dichotomous outcomes. One available approach for simulating length n vectors of dichotomous random variables is to sample from the multinomial distribution of all possible length n permutations of zeros and ones. However, the multinomial sampling method has only been implemented in general form (without first making restrictive assumptions) for vectors of length 2 and 3, because specifying the multinomial distribution is very challenging for longer vectors. I overcome this difficulty by presenting an algorithm for …


A Recommendation System For Meta-Modeling: A Meta-Learning Based Approach, Can Cui, Mengqi Hu, Jeffery D. Weir, Teresa Wu Jan 2016

A Recommendation System For Meta-Modeling: A Meta-Learning Based Approach, Can Cui, Mengqi Hu, Jeffery D. Weir, Teresa Wu

Faculty Publications

Various meta-modeling techniques have been developed to replace computationally expensive simulation models. The performance of these meta-modeling techniques on different models is varied which makes existing model selection/recommendation approaches (e.g., trial-and-error, ensemble) problematic. To address these research gaps, we propose a general meta-modeling recommendation system using meta-learning which can automate the meta-modeling recommendation process by intelligently adapting the learning bias to problem characterizations. The proposed intelligent recommendation system includes four modules: (1) problem module, (2) meta-feature module which includes a comprehensive set of meta-features to characterize the geometrical properties of problems, (3) meta-learner module which compares the performance of instance-based …


Global Network Inference From Ego Network Samples: Testing A Simulation Approach, Jeffrey A. Smith Apr 2015

Global Network Inference From Ego Network Samples: Testing A Simulation Approach, Jeffrey A. Smith

Department of Sociology: Faculty Publications

Network sampling poses a radical idea: that it is possible to measure global network structure without the full population coverage assumed in most network studies. Network sampling is only useful, however, if a researcher can produce accurate global network estimates. This article explores the practicality of making network inference, focusing on the approach introduced in Smith (2012). The method uses sampled ego network data and simulation techniques to make inference about the global features of the true, unknown network. The validity check here includes more difficult scenarios than previous tests, including those that go beyond the initial scope conditions of …


Comparison Of Some Improved Estimators For Linear Regression Model Under Different Conditions, Smit Shah Mar 2015

Comparison Of Some Improved Estimators For Linear Regression Model Under Different Conditions, Smit Shah

FIU Electronic Theses and Dissertations

Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that …


Quantitative Evidence For The Use Of Simulation And Randomization In The Introductory Statistics Course, Nathan L. Tintle, Ally Rogers, Beth Chance, George Cobb, Allan Rossman, Soma Roy, Todd Swanson, Jill Vanderstoep Jul 2014

Quantitative Evidence For The Use Of Simulation And Randomization In The Introductory Statistics Course, Nathan L. Tintle, Ally Rogers, Beth Chance, George Cobb, Allan Rossman, Soma Roy, Todd Swanson, Jill Vanderstoep

Faculty Work Comprehensive List

The use of simulation and randomization in the introductory statistics course is gaining popularity, but what evidence is there that these approaches are improving students’ conceptual understanding and attitudes as we hope? In this talk I will discuss evidence from early full-length versions of such a curriculum, covering issues such as (a) items and scales showing improved conceptual performance compared to traditional curriculum, (b) transferability of findings to different institutions, (c) retention of conceptual understanding post-course and (d) student attitudes. Along the way I will discuss a few areas in which students in both simulation/randomization courses and the traditional course …


In Silico Surveillance: Evaluating Outbreak Detection With Simulation Models, Bryan Lewis, Stephen Eubank, Allyson M. Abrams, Ken Kleinman Jan 2013

In Silico Surveillance: Evaluating Outbreak Detection With Simulation Models, Bryan Lewis, Stephen Eubank, Allyson M. Abrams, Ken Kleinman

Public Health Department Faculty Publication Series

Background

Detecting outbreaks is a crucial task for public health officials, yet gaps remain in the systematic evaluation of outbreak detection protocols. The authors’ objectives were to design, implement, and test a flexible methodology for generating detailed synthetic surveillance data that provides realistic geographical and temporal clustering of cases and use to evaluate outbreak detection protocols.

Methods

A detailed representation of the Boston area was constructed, based on data about individuals, locations, and activity patterns. Influenza-like illness (ILI) transmission was simulated, producing 100 years ofin silico ILI data. Six different surveillance systems were designed and developed using gathered cases …


Retention Of Statistical Concepts In A Preliminary Randomization-Based Introductory Statistics Curriculum, Nathan L. Tintle, Kylie Topliff, Jill Vanderstoep, Vicki-Lynn Holmes, Todd Swanson May 2012

Retention Of Statistical Concepts In A Preliminary Randomization-Based Introductory Statistics Curriculum, Nathan L. Tintle, Kylie Topliff, Jill Vanderstoep, Vicki-Lynn Holmes, Todd Swanson

Faculty Work Comprehensive List

Previous research suggests that a randomization-based introductory statistics course may improve student learning compared to the consensus curriculum. However, it is unclear whether these gains are retained by students post-course. We compared the conceptual understanding of a cohort of students who took a randomization-based curriculum (n = 76) to a cohort of students who used the consensus curriculum (n = 79). Overall, students taking the randomization-based curriculum showed higher conceptual retention in areas emphasized in the curriculum, with no significant decrease in conceptual retention in other areas. This study provides additional support for the use of randomization-methods in teaching introductory …


Parsing The Relationship Between Baserunning And Batting Abilities Within Lineups, Ben S. Baumer, James Piette, Brad Null Jan 2012

Parsing The Relationship Between Baserunning And Batting Abilities Within Lineups, Ben S. Baumer, James Piette, Brad Null

Statistical and Data Sciences: Faculty Publications

A baseball team's offensive prowess is a function of two types of abilities: batting and baserunning. While each has been studied extensively in isolation, the effects of their interaction is not well understood. We model offensive output as a scalar function f of an individual player's batting and baserunning profile z. Each of these profiles is in turn estimated from Retrosheet data using heirarchical Bayesian models. We then use the SimulOutCome simulation engine as a method to generate values of f(z) over a fine grid of points. Finally, for each of several methods of taking the extra base, we graphically …


A Framework For Generating Data To Simulate Application Scoring, Kenneth Kennedy, Sarah Jane Delany, Brian Mac Namee Aug 2011

A Framework For Generating Data To Simulate Application Scoring, Kenneth Kennedy, Sarah Jane Delany, Brian Mac Namee

Conference papers

In this paper we propose a framework to generate artificial data that can be used to simulate credit risk scenarios. Artificial data is useful in the credit scoring domain for two reasons. Firstly, the use of artificial data allows for the introduction and control of variability that can realistically be expected to occur, but has yet to materialise in practice. The ability to control parameters allows for a thorough exploration of the performance of classification models under different conditions. Secondly, due to non-disclosure agreements and commercial sensitivities, obtaining real credit scoring data is a problematic and time consuming task. By …


Combining Information From Two Surveys To Estimate County-Level Prevalence Rates Of Cancer Risk Factors And Screening, Trivellore E. Raghuanthan, Dawei Xie, Nathaniel Schenker, Van Parsons, William W. Davis, Kevin W. Dodd, Eric J. Feuer May 2006

Combining Information From Two Surveys To Estimate County-Level Prevalence Rates Of Cancer Risk Factors And Screening, Trivellore E. Raghuanthan, Dawei Xie, Nathaniel Schenker, Van Parsons, William W. Davis, Kevin W. Dodd, Eric J. Feuer

The University of Michigan Department of Biostatistics Working Paper Series

Cancer surveillance requires estimates of the prevalence of cancer risk factors and screening for small areas such as counties. Two popular data sources are the Behavioral Risk Factor Surveillance System (BRFSS), a telephone survey conducted by state agencies, and the National Health Interview Survey (NHIS), an area probability sample survey conducted through face-to-face interviews. Both data sources have advantages and disadvantages. The BRFSS is a larger survey, and almost every county is included in the survey; but it has lower response rates as is typical with telephone surveys, and it does not include subjects who live in households with no …


Regionalization Of Flood Data Using Probability Distributions And Their Parameters, Nageshwar Rao Bhaskar, Carol Alf O'Connor, Harold Andrew Myers, William Paul Puckett Dec 1989

Regionalization Of Flood Data Using Probability Distributions And Their Parameters, Nageshwar Rao Bhaskar, Carol Alf O'Connor, Harold Andrew Myers, William Paul Puckett

KWRRI Research Reports

The U. S. Geological survey recently used the method of residuals to delineate seven flood regions for the State of Kentucky. As an alternative approach, the FASTCLUS clustering procedure of the Statistical Analysis system (SAS) is used in this study to delineate five to six cluster regions in conjunction with statistical properties of the AMF series, like the coefficient of variation as estimated using method of L-moments, LCV, the parameters of the EVl and GEV flood frequency distributions, and the specific mean annual flood, QSP. For both cluster and USGS flood regions, regionalized flood frequency growth curves are developed and …


Simulating Regional Interindustry Models For Western States, William A. Schaffer, Kong Chu Jan 1969

Simulating Regional Interindustry Models For Western States, William A. Schaffer, Kong Chu

Applications

Although regional input-output models are now most frequently constructed on the basis of reasonably adequate surveys, simulation (estimating) techniques not based on original survey data are still in use by many regional scientists for quick and less costly results. We will modify our original aggregation procedures, examine our results through several statistical tests of tables constructed for three Western states, and discuss a possible correction procedure for improving raw estimates of interindustry transactions.