Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

10,405 Full-Text Articles 15,104 Authors 2,527,700 Downloads 217 Institutions

All Articles in Statistics and Probability

Faceted Search

10,405 full-text articles. Page 1 of 293.

Decision Making In A Changing Environment, Alan Veliz-Cuba 2018 University of Dayton

Decision Making In A Changing Environment, Alan Veliz-Cuba

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, Carlos Cruz 2018 Illinois State University

Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, Carlos Cruz

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Understanding Sexual Violence Against Women, Maria Martinez 2018 Illinois State University

Understanding Sexual Violence Against Women, Maria Martinez

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Selected Adventures In Policy Modeling, Edward Kaplan 2018 Illinois State University

Selected Adventures In Policy Modeling, Edward Kaplan

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Identifying Treatment Effects In The Presence Of Confounded Types, Desire Kedagni 2018 Iowa State University

Identifying Treatment Effects In The Presence Of Confounded Types, Desire Kedagni

Economics Working Papers

In this paper, I consider identification of treatment effects when
the treatment is endogenous. The use of instrumental variables is a popular
solution to deal with endogeneity, but this may give misleading answers when
the instrument is invalid. I show that when the instrument is invalid due to
correlation with the first stage unobserved heterogeneity, a second (also
possibly invalid) instrument allows to partially identify not only the local
average treatment effect but also the entire potential outcomes distributions
for compliers. I exploit the fact that the distribution of the observed
outcome in each group defined by the treatment and ...


Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma 2018 University of Havana

Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma

Journal of Modern Applied Statistical Methods

The use of randomized response procedures allows diminishing the number of non-responses and increasing the accuracy of the responses. A new sampling strategy is developed where the reports are scrambled using the procedure of Gupta and Thornton. The estimator of the mean as well as the errors are developed for the Rao-Hartley-Cochran and Ranked Sets Sampling designs. The proposals are compared with the original model based on the use of simple random sampling.


Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, Yuejia Xu, Oliver B. Laeyendecker, Rui Wang 2018 University of Cambridge

Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, Yuejia Xu, Oliver B. Laeyendecker, Rui Wang

Harvard University Biostatistics Working Paper Series

No abstract provided.


Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling 2018 National Cancer Institute

Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling

Journal of Modern Applied Statistical Methods

Two multiple imputation methods, the Sequential Regression Multivariate Imputation Algorithm and the Cox-Lannacchione Weighted Sequential Hotdeck, were examined and compared to impute highly missing categorical variables from the Family Life, Activity, Sun, Health and Eating (FLASHE) study. This paper describes the imputation approaches and results from the study.


Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora 2018 Panjab University, Chandigarh, India

Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora

Journal of Modern Applied Statistical Methods

Bayesian and semi-Bayesian estimators of parameters of the generalized inverse Weibull distribution are obtained using Jeffreys’ prior and informative prior under specific assumptions of loss function. Using simulation, the relative efficiency of the proposed estimators is obtained under different set-ups. A real life example is also given.


Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl 2018 Southern Methodist University

Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl

SMU Data Science Review

In this paper, we present a heart disease prediction use case showing how synthetic data can be used to address privacy concerns and overcome constraints inherent in small medical research data sets. While advanced machine learning algorithms, such as neural networks models, can be implemented to improve prediction accuracy, these require very large data sets which are often not available in medical or clinical research. We examine the use of surrogate data sets comprised of synthetic observations for modeling heart disease prediction. We generate surrogate data, based on the characteristics of original observations, and compare prediction accuracy results achieved from ...


Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler 2018 Southern Methodist University

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

SMU Data Science Review

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool ...


Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi 2018 Southern Methodist University

Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi

SMU Data Science Review

In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.


Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John 2018 Southern Methodist University

Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John

SMU Data Science Review

In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age ...


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels 2018 Southern Methodist University

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...


Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra 2018 Southern Methodist University

Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra

SMU Data Science Review

In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to ...


Coastal Wetland Dynamics Under Sea-Level Rise And Wetland Restoration In The Northern Gulf Of Mexico Using Bayesian Multilevel Models And A Web Tool, Tyler Hardy 2018 The University of Southern Mississippi

Coastal Wetland Dynamics Under Sea-Level Rise And Wetland Restoration In The Northern Gulf Of Mexico Using Bayesian Multilevel Models And A Web Tool, Tyler Hardy

Master's Theses

There is currently a lack of modeling framework to predict how relative sea-level rise (SLR), combined with restoration activities, affects landscapes of coastal wetlands with uncertainties accounted for at the entire northern Gulf of Mexico (NGOM). I developed such a modeling framework – Bayesian multi-level models to study the spatial pattern of wetland loss in the NGOM, driven by relative RSLR, vegetation productivity, tidal range, coastal slope, and wave height – all interacting with river-borne sediment availability, indicated by hydrological regimes. These interactions have not been comprehensively investigated before. I further modified this model to assess the efficacy of restoration projects from ...


Of Typicality And Predictive Distributions In Discriminant Function Analysis, Lyle W. Konigsberg, Susan R. Frankenberg 2018 Department of Anthropology, University of Illinois at Urbana–Champaign

Of Typicality And Predictive Distributions In Discriminant Function Analysis, Lyle W. Konigsberg, Susan R. Frankenberg

Human Biology Open Access Pre-Prints

While discriminant function analysis is an inherently Bayesian method, researchers attempting to estimate ancestry in human skeletal samples often follow discriminant function analysis with the calculation of frequentist-based typicalities for assigning group membership. Such an approach is problematic in that it fails to account for admixture and for variation in why individuals may be classified as outliers, or non-members of particular groups. This paper presents an argument and methodology for employing a fully Bayesian approach in discriminant function analysis applied to cases of ancestry estimation. The approach requires adding the calculation, or estimation, of predictive distributions as the final step ...


A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba 2018 Valparaiso University

A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba

Tiffany N Kolba

The phenomenon of twins, triplets, quadruplets, and other higher order multiples has fascinated humans for centuries and has even captured the attention of mathematicians who have sought to model the probabilities of multiple births. However, there has not been extensive research into the phenomenon of polyovulation, which is one of the biological mechanisms that produces multiple births. In this paper, I describe how my own experience becoming a mother to twins led me on a quest to better understand the scientific processes going on inside my own body and motivated me to conduct research on polyovulation frequencies. An overview of ...


Secondary Data Analysis Project, Jonathan M. Gallimore 2018 Embry-Riddle Aeronautical University

Secondary Data Analysis Project, Jonathan M. Gallimore

SF 420 PR - Gallimore - Fall 2018

This activity is designed to give students an opportunity to apply what they have learned in statistics to a real dataset.

This activity will help students apply what they have learned in statistics to real world data and answer their own research questions. Students will also practice reporting their results in a paper using APA format.


Quantitative Jeopardy Feud, Jonathan M. Gallimore 2018 Embry-Riddle Aeronautical University

Quantitative Jeopardy Feud, Jonathan M. Gallimore

MSF 600 PR - Gallimore - Fall 2018

This activity - Quantitative Jeopardy Feud - is a method for using a game as a final exam.


Digital Commons powered by bepress