Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

12,103 Full-Text Articles 18,562 Authors 3,366,482 Downloads 247 Institutions

All Articles in Statistics and Probability

Faceted Search

12,103 full-text articles. Page 2 of 368.

'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst 2020 Utah State University

'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst

All Graduate Theses and Dissertations

In order to effectively create map-based visualizations, some map modifications need to be conducted to ensure the map is readable and interpretable. There are several issues that need to be addressed to achieve this. The boundaries of a country may be overly complex which is particularly true with coastal areas of countries. Regions may be small and not seen in the final plot, as is the case with many capital cities in the world’s countries such as Washington D.C. and the Federal District of Mexico City. In other countries, regions may geographically lie far away from the rest ...


Modeling Movement: A Machine-Learning Approach To Track Migration Routes After Displacement, Ethan Harrison 2020 William & Mary

Modeling Movement: A Machine-Learning Approach To Track Migration Routes After Displacement, Ethan Harrison

Undergraduate Honors Theses

Over the past decade, the number of individuals internally displaced by conflict (IDPs) has reached unprecedented levels. Humanitarian actors and first-responders face persistent information gaps in meeting the needs of these populations. Specifically, they face challenges in understanding where and how IDPs move after they are displaced, which is necessary to locate them in conflict-affected situations and provide them with life-saving assistance. In this paper, I propose a framework, using established machine-learning methods, to forecast the migration routes of these displaced populations (Chapter 1). In a case study of displacement in Yemen, my models predict 80% of IDPs' migration routes ...


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen 2020 Utah State University

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and ...


Life And Death: Quantifying The Risk Of Heart Disease With Machine Learning, Jack Scott Glienke 2020 University of Northern Iowa

Life And Death: Quantifying The Risk Of Heart Disease With Machine Learning, Jack Scott Glienke

Honors Program Theses

Coronary heart disease has long been a key area of focus in the discussion of public health. As such, numerous studies have been conducted throughout history with the sole intention of identifying risk factors leading to the onset of cardiovascular conditions. A plethora of statistical procedures can be used to identify an individual’s risk of developing heart disease, yet regression models tend to be the default tool used by researchers. Using the data obtained from the most influential cardiovascular study to date, the Framingham Heart Study, this analysis uses machine learning techniques to generate and test the predictive power ...


Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever 2020 HCA Healthcare

Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever

HCA Healthcare Journal of Medicine

This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.


Rdc Data Alternatives: Conducting Research During Covid-19, Kristi Thompson, Elizabeth Hill 2020 Western University

Rdc Data Alternatives: Conducting Research During Covid-19, Kristi Thompson, Elizabeth Hill

Western Libraries Presentations

Recent physical distancing protocols pertaining to the COVID-19 Pandemic have meant that RDC researchers need to find alternatives ways of carrying out their research. The Real Time Remote Access (RTRA) program offers one alternative way to access confidential Statistics Canada data. Other options include using the Statistics Canada public use files and analyzing data from other sources.

The presenters, data librarians from Western Libraries will discuss the differences between the data that can be accessed through the RTRA the RDC. RTRA data is a very useful option for some types of questions but also has some important limitations. We will ...


A Simulation Study On Increasing Capture Periods In Bayesian Closed Population Capture-Recapture Models With Heterogeneity, Ross M. Gosky, Joel Sanqui 2020 Appalachian State University

A Simulation Study On Increasing Capture Periods In Bayesian Closed Population Capture-Recapture Models With Heterogeneity, Ross M. Gosky, Joel Sanqui

Journal of Modern Applied Statistical Methods

Capture-Recapture models are useful in estimating unknown population sizes. A common modeling challenge for closed population models involves modeling unequal animal catchability in each capture period, referred to as animal heterogeneity. Inference about population size N is dependent on the assumed distribution of animal capture probabilities in the population, and that different models can fit a data set equally well but provide contradictory inferences about N. Three common Bayesian Capture-Recapture heterogeneity models are studied with simulated data to study the prevalence of contradictory inferences is in different population sizes with relatively low capture probabilities, specifically at different numbers of capture ...


Logistic Growth Modeling With Markov Chain Monte Carlo Estimation, Jaehwa Choi, Jinsong Chen, Jeffrey R. Harring 2020 The George Washington University

Logistic Growth Modeling With Markov Chain Monte Carlo Estimation, Jaehwa Choi, Jinsong Chen, Jeffrey R. Harring

Journal of Modern Applied Statistical Methods

A new growth modeling approach is proposed to can fit inherently nonlinear (i.e., logistic) function without constraint nor reparameterization. A simulation study is employed to investigate the feasibility and performance of a Markov chain Monte Carlo method within Bayesian estimation framework to estimate a fully random version of a logistic growth curve model under manipulated conditions such as the number and timing of measurement occasions and sample sizes.


Forecasting San Francisco Bay Area Rapid Transit (Bart) Ridership, Swee K. Chew, Alec Lepe, Aaron Tomkins, Peter Scheirer 2020 Southern Methodist University (SMU)

Forecasting San Francisco Bay Area Rapid Transit (Bart) Ridership, Swee K. Chew, Alec Lepe, Aaron Tomkins, Peter Scheirer

SMU Data Science Review

In this paper, we present a forecasting analysis of the San Francisco Bay Area Rapid Transit (BART) ridership data utilizing a number of different time series methods. BART is a major public transportation system in the Bay Area and it relies heavily on its riders' fares; having models that generate accurate ridership numbers better enables the agency to project revenue and help manage future expenses. For our time series modeling, we utilized autoregressive integrated moving average (ARIMA), deep neural networks (DNN), state space models, and long short-term memory (LSTM) to predict monthly ridership. As there is such a wide range ...


484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair 2020 SUNY Geneseo

484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair

GREAT Day

Ebola Virus Disease (EVD) is a rare but severe disease that is transmitted among humans through direct-contact with, and close proximity to, infected bodily fluids. From 2014-16, West Africa experienced the largest Ebola outbreak ever recorded, infecting over 28,000 people, and killing over 11,000. Although the symptoms of EVD are treatable, the disease can be extremely deadly, with an average of 50% EVD cases resulting in fatality. In areas where healthcare is scarce and vaccinations are not readily available, the practices of social distancing and self-quarantining have been shown to be highly effective in combating the spread of ...


483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal 2020 SUNY Geneseo

483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal

GREAT Day

Measles is a highly contagious disease, where large outbreaks arise by direct contact between susceptible (unvaccinated) and infectious individuals. Many Orthodox Jewish neighborhoods were affected by measles from 2018-2019. To quantify the vaccination effort on this susceptible population, a retrospective analysis was used to study the NYC and Rockland County populations using a differential equations model. A subsequent model, known as a realistically-structured network model, studied only the NYC population, in relation to typical household size. Vaccination strategies were applied to three cohorts: unvaccinated family members, members with 1 prior MMR dose, and members with 2 prior MMR doses. The ...


465— Modeling Vaccine Efficacy For Tuberculosis In A Prison Population, Kaitlyn Mundackal 2020 SUNY Geneseo

465— Modeling Vaccine Efficacy For Tuberculosis In A Prison Population, Kaitlyn Mundackal

GREAT Day

Tuberculosis is a highly contagious disease and is particularly problematic in confined communities such as prisons. I simulated how Tuberculosis moves through a prison population and tested how much vaccination effort is needed to control its spread. To explore this, I tested adding ever increasing numbers of randomly placed edges in a network and determined the size of the largest component. Afterwards, I removed edges in the model using two different methods, one illustrating if the edges were removed randomly and the other starting with prisoners that had the most connections, to simulate the effect of vaccination. My results show ...


Universal Vector Neural Machine Translation With Effective Attention, Joshua Yi, Satish Mylapore, Ryan Paul, Robert Slater 2020 SMU

Universal Vector Neural Machine Translation With Effective Attention, Joshua Yi, Satish Mylapore, Ryan Paul, Robert Slater

SMU Data Science Review

Neural Machine Translation (NMT) leverages one or more trained neural networks for the translation of phrases. Sutskever intro- duced a sequence to sequence based encoder decoder model which be- came the standard for NMT based systems. Attention mechanisms were later introduced to address the issues with the translation of long sen- tences and improving overall accuracy. In this paper, we propose two improvements to the encoder decoder based NMT approach. Most trans- lation models are trained as one model for one translation. We introduce a neutral/universal model representation that can be used to predict more than one language depending ...


Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater 2020 Southern Methodist University

Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater

SMU Data Science Review

In this paper, historical data from a wholesale alcoholic beverage distributor was used to forecast sales demand. Demand forecasting is a vital part of the sale and distribution of many goods. Accurate forecasting can be used to optimize inventory, improve cash ow, and enhance customer service. However, demand forecasting is a challenging task due to the many unknowns that can impact sales, such as the weather and the state of the economy. While many studies focus effort on modeling consumer demand and endpoint retail sales, this study focused on demand forecasting from the distributor perspective. An ensemble approach was applied ...


Demand Forecasting For Alcoholic Beverage Distribution, Lei Jiang, Kristen M. Rollins, Meredith Ludlow, Bivin Sadler 2020 Southern Methodist University

Demand Forecasting For Alcoholic Beverage Distribution, Lei Jiang, Kristen M. Rollins, Meredith Ludlow, Bivin Sadler

SMU Data Science Review

Forecasting demand is one of the biggest challenges in any business, and the ability to make such predictions is an invaluable resource to a company. While difficult, predicting demand for products should be increasingly accessible due to the volume of data collected in businesses and the continuing advancements of machine learning models. This paper presents forecasting models for two vodka products for an alcoholic beverage distributing company located in the United States with the purpose of improving the company’s ability to forecast demand for those products. The results contain exploratory data analysis to determine the most important variables impacting ...


Act Scores Across Minnesota's Congressional Districts, Katie Moynihan 2020 Concordia University St. Paul

Act Scores Across Minnesota's Congressional Districts, Katie Moynihan

Research and Scholarship Symposium Posters

Data analysis was conducted to test factors which could affect the ACT scores of Minnesota high school students. Average composite scores across the state’s eight congressional districts were evaluated. Factors studied include family income, parental education, diversity, district location, graduation class size, and graduation rate. Methodology and results will be discussed.


The Impact Of Pev User Charging Behavior In Building Public Charging Infrastructure, Ahmad Almaghrebi 2020 University of Nebraska - Lincoln

The Impact Of Pev User Charging Behavior In Building Public Charging Infrastructure, Ahmad Almaghrebi

Architectural Engineering -- Dissertations and Student Research

Plug-in electric vehicles (PEVs) play a significant role in the development of green cities since they generate less pollution than conventional vehicles. To promote PEV adoption and mitigate range anxiety, charging infrastructure should be deployed at strategic locations that are readily accessible to the public. Nebraska is working on the expansion of charging infrastructure around the state; however, stakeholders face several difficulties in trying to minimize irregular charging behaviors. Most electric vehicle users plug in and leave their vehicles for an extended time at public parking lots designated for PEVs. Some users even leave their vehicles for longer than 24 ...


An Exploration Of Link Functions Used In Ordinal Regression, Thomas J. Smith, David A. Walker, Cornelius M. McKenna 2020 Northern Illinois University

An Exploration Of Link Functions Used In Ordinal Regression, Thomas J. Smith, David A. Walker, Cornelius M. Mckenna

Journal of Modern Applied Statistical Methods

The purpose of this study is to examine issues involved with choice of a link function in generalized linear models with ordinal outcomes, including distributional appropriateness, link specificity, and palindromic invariance are discussed and an exemplar analysis provided using the Pew Research Center 25th anniversary of the Web Omnibus Survey data. Simulated data are used to compare the relative palindromic invariance of four distinct indices of determination/discrimination, including a newly proposed index by Smith et al. (2017).


Analysis Of Gas Mileage Of A Car, Joshua Ballard-Myer 2020 Georgia College

Analysis Of Gas Mileage Of A Car, Joshua Ballard-Myer

Georgia College Student Research Events

The objective of this work is to analyze a data set, Auto, from the R package ISLR: Introduction to Statistical Learning in R. The data set includes information for 392 observations on 9 variables including gas mileage, horsepower, weight in pounds, and engine displacement in cubic inches. The data set was taken from the StatLib library maintained at Carnegie Mellon University. The primary response variable will be gas mileage in miles per gallon, with all other variables serving as predictors, but other relationships with other response variables such as acceleration will be explored. Results were similar to expected; traits desirable ...


Personal Foul: How Head Trauma And The Insurance Industry Are Threatening Sports, Zachary Cooler 2020 Liberty University

Personal Foul: How Head Trauma And The Insurance Industry Are Threatening Sports, Zachary Cooler

Senior Honors Theses

This thesis will investigate the growing problem of head trauma in contact sports like football, hockey, and soccer through medical studies, implications to the insurance industry, and ongoing litigation. The thesis will investigate medical studies that are finding more evidence to support the claim that contact sports players are more likely to receive head trauma symptoms such as memory loss, mood swings, and even Lou Gehrig’s disease in extreme cases. The thesis will also demonstrate that these medical symptoms and monetary losses from medical claims are convincing insurance companies to withdraw insurance coverage for sports leagues, which they are ...


Digital Commons powered by bepress