Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

11,021 Full-Text Articles 16,264 Authors 3,366,482 Downloads 226 Institutions

All Articles in Statistics and Probability

Faceted Search

11,021 full-text articles. Page 7 of 321.

The Power Law Distribution Of Agricultural Land Size, Lauren Chamberlain 2018 Utah State University

The Power Law Distribution Of Agricultural Land Size, Lauren Chamberlain

All Graduate Theses and Dissertations

This paper demonstrates that the distribution of county level agricultural land size in the United States is best described by a power-law distribution, a distribution that displays extremely heavy tails. This indicates that the majority of farmland exists in the upper tail. Our analysis indicates that the top 5% of agricultural counties account for about 25% of agricultural land between 1997-2012. The power-law distribution of farm size has important implications for the design of more efficient regional and national agricultural policies as counties close to the mean account for little of the cumulative distribution of total agricultural land. This has ...


Surviving A Civil War: Expanding The Scope Of Survival Analysis In Political Science, Andrew B. Whetten 2018 Utah State University

Surviving A Civil War: Expanding The Scope Of Survival Analysis In Political Science, Andrew B. Whetten

All Graduate Theses and Dissertations

Survival Analysis in the context of Political Science is frequently used to study the duration of agreements, political party influence, wars, senator term lengths, etc. This paper surveys a collection of methods implemented on a modified version of the Power-Sharing Event Dataset (which documents civil war peace agreement durations in the Post-Cold War era) in order to identify the research questions that are optimally addressed by each method. A primary comparison will be made between a Cox Proportional Hazards Model using some advanced capabilities in the glmnet package, a Survival Random Forest Model, and a Survival SVM. En route to ...


Text Analytics Approach To Extract Course Improvement Suggestions From Students’ Feedback, Swapna GOTTIPATI, Venky SHANKARARAMAN, Jeff Rongsheng LIN 2018 Singapore Management University

Text Analytics Approach To Extract Course Improvement Suggestions From Students’ Feedback, Swapna Gottipati, Venky Shankararaman, Jeff Rongsheng Lin

Research Collection School Of Information Systems

In academic institutions, it is normal practice that at the end of each term, students are required to complete a questionnaire that is designed to gather students’ perceptions of the instructor and their learning experience in the course. Students’ feedback includes numerical answers to Likert scale questions and textual comments to open-ended questions. Within the textual comments given by the students are embedded suggestions. A suggestion can be explicit or implicit. Any suggestion provides useful pointers on how the instructor can further enhance the student learning experience. However, it is tedious to manually go through all the qualitative comments and ...


Role Of Misclassification Estimates In Estimating Disease Prevalence And A Non-Linear Approach To Study Synchrony Using Heart Rate Variability In Chickens, Dola Pathak 2018 University of Nebraska-Lincoln

Role Of Misclassification Estimates In Estimating Disease Prevalence And A Non-Linear Approach To Study Synchrony Using Heart Rate Variability In Chickens, Dola Pathak

Dissertations and Theses in Statistics

Infectious disease assays can be imperfect. When estimating disease prevalence, these imperfections are accounted for by incorporating assay sensitivity and specificity into point and variance estimates. Unfortunately, these accuracy measures are often treated as fixed constants, rather than acknowledging that they are estimates from an assay validation process. The purpose of this study is to show the detrimental effect of not taking into account this sampling variability when samples are obtained through group testing (aka, pooled testing). We show that confidence interval coverage can dramatically decline as the sample size increases for the main sample of interest. As a remedy ...


Sequential Inference For Hidden Markov Models, Michael Ellis 2018 University of Arkansas, Fayetteville

Sequential Inference For Hidden Markov Models, Michael Ellis

Theses and Dissertations

In many applications data are collected sequentially in time with very short time intervals between observations. If one is interested in using new observations as they arrive in time then non-sequential Bayesian inference methods, such as Markov Chain Monte Carlo (MCMC) sampling, can be too slow. Increasingly, state space models are being used to model nonlinear and non-Gaussian systems. The structure of state space models allows for sequential Bayesian inference so that an approximation to the posterior distribution of interest can be updated as new observations arrive. In special cases, the exact posterior distribution can be updated through conjugate Bayesian ...


Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang 2018 University of Arkansas, Fayetteville

Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang

Theses and Dissertations

Regression analysis fits predictive models to data on a response variable and corresponding values for a set of explanatory variables. Often data on the explanatory variables come at a cost from commercial databases, so the available budget may limit which ones are used in the final model.

In this dissertation, two budget-constrained regression models are proposed for continuous and categorical variables respectively using Mixed Integer Nonlinear Programming (MINLP) to choose the explanatory variables to be included in solutions. First, we propose a budget-constrained linear regression model for continuous response variables. Properties such as solvability and global optimality of the proposed ...


Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan 2018 University of Arkansas, Fayetteville

Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan

Theses and Dissertations

The USDA Forest Service aims to use satellite imagery for monitoring and predicting changes in forest conditions over time within the country. We specifically focus on a 230, 400 hectares region in north-central Wisconsin between 2003 - 2012. The auxiliary data collected from the satellite imagery of this region are relatively dense in space and time and can be used to efficiently predict how the forest condition changed over that decade. However, these records have a significant proportion of missing values due to weather conditions and system failures. To fill in these missing values, we build spaciotemporal models based on fixed ...


A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano 2018 University of Arkansas, Fayetteville

A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano

Theses and Dissertations

The objective of the research this thesis describes is to find a way to classify text-based descriptions of biological adaption to support Biologically Inspired design. Biologically inspired design is a fairly new field with ongoing research. There are different tools to assist designers and biologists in bio-inspired design. Some of the most common are BioTRIZ and AskNature. In recent years, more tools have been proposed to aid and make research in the field easier, for example, the Biologically Inspired Adaptive System Design (BIASD) tool. This tool was designed with the goal of helping designers in early design stages generate more ...


Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An 2018 Washington University in St. Louis

Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An

Arts & Sciences Electronic Theses and Dissertations

Inspired by classic cocktail-party problem, the basic Independent Component Analysis (ICA) model is created. What differs Independent Component Analysis (ICA) from other kinds of analysis is the intrinsic non-Gaussian assumption of the data. Several approaches are proposed based on maximizing the non-Gaussianity of the data, which is measured by kurtosis, mutual information, and others. With each estimation, we need to optimize the functions of expectations of non-quadratic functions since it can help us to access the higher-order statistics of non-Gaussian part of the data. In this thesis, our goal is to review the one of the most efficient estimation methods ...


Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett 2018 Utah State University

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett

All Graduate Plan B and other Reports

Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create ...


Concentrations Of Criteria Pollutants In The Contiguous U.S., 1979 – 2015: Role Of Model Parsimony In Integrated Empirical Geographic Regression, Sun-Young Kim, Matthew Bechle, Steve Hankey, Elizabeth (Lianne) A. Sheppard, Adam A. Szpiro, Julian D. Marshall 2018 University of Washington - Seattle Campus

Concentrations Of Criteria Pollutants In The Contiguous U.S., 1979 – 2015: Role Of Model Parsimony In Integrated Empirical Geographic Regression, Sun-Young Kim, Matthew Bechle, Steve Hankey, Elizabeth (Lianne) A. Sheppard, Adam A. Szpiro, Julian D. Marshall

UW Biostatistics Working Paper Series

BACKGROUND: National- or regional-scale prediction models that estimate individual-level air pollution concentrations commonly include hundreds of geographic variables. However, these many variables may not be necessary and parsimonious approach including small numbers of variables may achieve sufficient prediction ability. This parsimonious approach can also be applied to most criteria pollutants. This approach will be powerful when generating publicly available datasets of model predictions that support research in environmental health and other fields. OBJECTIVES: We aim to (1) build annual-average integrated empirical geographic (IEG) regression models for the contiguous U.S. for six criteria pollutants, for all years with regulatory monitoring ...


Stochastic Lanczos Likelihood Estimation Of Genomic Variance Components, Richard Border 2018 University of Colorado, Boulder

Stochastic Lanczos Likelihood Estimation Of Genomic Variance Components, Richard Border

Applied Mathematics Graduate Theses & Dissertations

Genomic variance components analysis seeks to estimate the extent to which interindividual variation in a given trait can be attributed to genetic similarity. Likelihood estimation of such models involves computationally expensive operations on large, dense, and unstructured matrices of high rank. As a result, standard estimation procedures relying on direct matrix methods become prohibitively expensive as sample sizes increase. We propose a novel estimation procedure that uses the Lanczos process and stochastic Lanczos quadrature to approximate the likelihood for an initial choice of parameter values. Then, by identifying the variance components parameter space with a family of shifted linear systems ...


Seasonal Warranty Prediction Based On Recurrent Event Data, Qianqian Shan, Yili Hong, William Q. Meeker Jr. 2018 Iowa State University

Seasonal Warranty Prediction Based On Recurrent Event Data, Qianqian Shan, Yili Hong, William Q. Meeker Jr.

Statistics Preprints

Warranty return data from repairable systems, such as vehicles, usually result in recurrent event data. The non-homogeneous Poisson process (NHPP) model is used widely to describe such data. Seasonality in the repair frequencies and other variabilities, however, complicate the modeling of recurrent event data. Not much work has been done to address the seasonality, and this paper provides a general approach for the application of NHPP models with dynamic covariates to predict seasonal warranty returns. A hierarchical clustering method is used to stratify the population into groups that are more homogeneous than the than the overall population. The stratification facilitates ...


A Data Set Of Bloodstain Patterns For Teaching And Research In Bloodstain Pattern Analysis: Gunshot Backspatters, Daniel Attinger, Yu Liu, Ricky Faflak, Yalin Rao, Bryce A. Struttman, Kris De Brabanter, Patrick M. Comiskey, Alex L. Yarin 2018 Iowa State University

A Data Set Of Bloodstain Patterns For Teaching And Research In Bloodstain Pattern Analysis: Gunshot Backspatters, Daniel Attinger, Yu Liu, Ricky Faflak, Yalin Rao, Bryce A. Struttman, Kris De Brabanter, Patrick M. Comiskey, Alex L. Yarin

Mechanical Engineering Publications

This is a data set of blood spatter patterns scanned at high resolution, generated in controlled experiments. The spatter patterns were generated with a rifle or a handgun, and different ammunitions. The resulting atomized blood droplets travelled opposite to the bullet direction, generating a gunshot backspatter on a poster board target sheet. Fresh blood with anticoagulants was used; its hematocrit and temperature were measured. Main parameters of the study were the bullet shape, size and speed, and the distance between the blood source and target sheet. Several other parameters were explored in a less systematic way. This new and original ...


An Analysis Of Classroom Collusion Using Latent Dirichlet Allocation, Charles B. Shrader, Sue P. Ravenscroft, Jeffrey Kaufmann 2018 Iowa State University

An Analysis Of Classroom Collusion Using Latent Dirichlet Allocation, Charles B. Shrader, Sue P. Ravenscroft, Jeffrey Kaufmann

Management Conference Papers, Posters and Proceedings

In this study, we use Latent Dirichlet Allocation to explore the reflections of students who faced a demanding classroom challenge, to which some responded by colluding. Our five-topic LDA solution describes the cheating event in terms of the nature of the course assignment itself, teams as a resource and support mechanism, the repercussions of cheating, and differences between majors or course tracks. The most relevant topics were the differences between the tracks and the repercussions of cheating. Teams and teammates also play a large role in the students’ reflections. We conclude with the implications of these topics in future research.


Dynamics Of Paramagnetic And Ferromagnetic Ellipsoidal Particles In Shear Flow Under A Uniform Magnetic Field, Christopher A. Sobecki, Jie Zhang, Yanzhi Zhang, Cheng Wang 2018 Missouri University of Science and Technology

Dynamics Of Paramagnetic And Ferromagnetic Ellipsoidal Particles In Shear Flow Under A Uniform Magnetic Field, Christopher A. Sobecki, Jie Zhang, Yanzhi Zhang, Cheng Wang

Yanzhi Zhang

We investigate the two-dimensional dynamic motion of magnetic particles of ellipsoidal shapes in shear flow under the influence of a uniform magnetic field. In the first part, we present a theoretical analysis of the rotational dynamics of the particles in simple shear flow. By considering paramagnetic and ferromagnetic particles, we study the effects of the direction and strength of the magnetic field on the particle rotation. The critical magnetic-field strength, at which particle rotation is impeded, is determined. In a weak-field regime (i.e., below the critical strength) where the particles execute complete rotations, the symmetry property of the rotational ...


Decoupled, Linear, And Energy Stable Finite Element Method For The Cahn-Hilliard-Navier-Stokes-Darcy Phase Field Model, Yali Gao, Xiaoming He, Liquan Mei, Xiaofeng Yang 2018 Missouri University of Science and Technology

Decoupled, Linear, And Energy Stable Finite Element Method For The Cahn-Hilliard-Navier-Stokes-Darcy Phase Field Model, Yali Gao, Xiaoming He, Liquan Mei, Xiaofeng Yang

Xiaoming He

In this paper, we consider the numerical approximation for a phase field model of the coupled two-phase free flow and two-phase porous media flow. This model consists of Cahn—Hilliard—Navier—Stokes equations in the free flow region and Cahn—Hilliard—Darcy equations in the porous media region that are coupled by seven interface conditions. The coupled system is decoupled based on the interface conditions and the solution values on the interface from the previous time step. A fully discretized scheme with finite elements for the spatial discretization is developed to solve the decoupled system. In order to deal with ...


Learning Statistics Through Guided Block Play: A Pre-Curriculum In Statistical Literacy, Robert P. Giebitz 2018 University of New Mexico - Main Campus

Learning Statistics Through Guided Block Play: A Pre-Curriculum In Statistical Literacy, Robert P. Giebitz

Organization, Information and Learning Sciences ETDs

Learning to use data to investigate the world and make decisions has become an essential skill for all citizens. Play and curiosity are powerful motivators for learning. Inquiry – the process of asking questions and seeking answers – can engage the natural curiosity of young learners and motivate early learning. Recent research in statistics education has shown that children as young as 4 and 5 years old can learn to collect, organize, and interpret data they acquire through observation, counting, and measuring in a process of guided inquiry. Guided block play has been used for over 100 years to enable children to ...


The Gaise College Report: The American Statistical Association Meets Sound Pedagogy In Central Virginia, Beverly Wood 2018 Embry-Riddle Aeronautical University

The Gaise College Report: The American Statistical Association Meets Sound Pedagogy In Central Virginia, Beverly Wood

Beverly Wood

Research in undergraduate statistics education often centers on the introductory course required for a large percentage of college students. While acknowledging the diverse setting, audience, and purpose of introductory courses, existing research assumes that courses offered by different disciplines share the same goals and teaching practices. The purpose of this study is to examine the objectives for student outcomes and pedagogical delivery of introductory statistics courses in various academic departments to provide explicit evidence for this assumption. The American Statistical Association’s Guidelines for Assessment and Instruction in Statistics Education (GAISE) are meant to apply to all introductory courses. The ...


Guidelines For Assessment And Instruction In Statistics Education (Gaise) College Report 2016, Robert Carver, Michelle Everson, John Gabrosek, Nicholas Horton, Robin Lock, Megan Mocko, Allan Rossman, Ginger Holmes Roswell, Paul Velleman, Jeffrey Witmer, Beverly Wood 2018 Stonehill College

Guidelines For Assessment And Instruction In Statistics Education (Gaise) College Report 2016, Robert Carver, Michelle Everson, John Gabrosek, Nicholas Horton, Robin Lock, Megan Mocko, Allan Rossman, Ginger Holmes Roswell, Paul Velleman, Jeffrey Witmer, Beverly Wood

Beverly Wood

In 2005 the American Statistical Association (ASA) endorsed the Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report. This report has had a profound impact on the teaching of introductory statistics in two- and four-year institutions, and the six recommendations put forward in the report have stood the test of time. Much has happened within the statistics education community and beyond in the intervening 10 years, making it critical to re-evaluate and update this important report. For readers who are unfamiliar with the original GAISE College Report or who are new to the statistics education community, the full ...


Digital Commons powered by bepress