Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

10,517 Full-Text Articles 15,304 Authors 2,527,700 Downloads 218 Institutions

All Articles in Statistics and Probability

Faceted Search

10,517 full-text articles. Page 1 of 297.

Of Rats And Men, Thomas S. Walsh 2018 Cuny Graduate School of Journalism

Of Rats And Men, Thomas S. Walsh

Capstones

This capstone is a data-driven investigation into New York City's rat problem. By using publicly available government data to map rat activity in NYC, I identified several socio-economic variables that correlate with rat populations at the community district, borough, and city-scale. I used these findings (mainly that rat problems are linked to lower incomes) as the basis of an investigation, which includes interviews with residents, experts, and city officials. Prof. Bobby Corrigan, urban rodentologist and formerly with the NYC Department of Health criticizes the city's efforts for the first time on the record.

https://thomasseiyawalsh.wixsite.com/ratstone


Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, Yujie Song 2018 University of Windsor

Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, Yujie Song

Major Papers

In this major paper, we study the influence of structural breaks in the financial market model with high-dimensional data. We present a model which is capable of detecting changes in factor loadings, determining the number of factors and detecting the break date. We consider the case where the break date is both known and unknown and identify the type of instability. For the unknown break date case, we propose a group-LASSO estimator to determine the number of pre- and post-break factors, the break date and the existence of instability of factor loadings when the number of factor is constant. We ...


Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao 2018 University of Windsor

Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao

Major Papers

In this major paper, we use high-dimensional models to analyze macroeconomic data which is in influenced by the break point. In particular, we consider to detect the break point and study the changes of the number of factors and the factor loadings with the structural instability.

Concretely, we propose two factor models which explain the processes of pre- and post- break periods. Then, we consider the break point as known or unknown. In both situations, we derive the shrinkage estimators by minimizing the penalized least square function and calculate the estimators of the numbers of pre- and post- break factors ...


On Projection Of A Positive Definite Matrix On A Cone Of Nonnegative Definite Toeplitz Matrices, Katarzyna Filipiak, Augustyn Markiewicz, Adam Mieldzioc, Aneta Sawikowska 2018 Poznań University Of Technology

On Projection Of A Positive Definite Matrix On A Cone Of Nonnegative Definite Toeplitz Matrices, Katarzyna Filipiak, Augustyn Markiewicz, Adam Mieldzioc, Aneta Sawikowska

Electronic Journal of Linear Algebra

We consider approximation of a given positive definite matrix by nonnegative definite banded Toeplitz matrices. We show that the projection on linear space of Toeplitz matrices does not always preserve nonnegative definiteness. Therefore we characterize a convex cone of nonnegative definite banded Toeplitz matrices which depends on the matrix dimensions, and we show that the condition of positive definiteness given by Parter [{\em Numer. Math. 4}, 293--295, 1962] characterizes the asymptotic cone. In this paper we give methodology and numerical algorithm of the projection basing on the properties of a cone of nonnegative definite Toeplitz matrices. This problem can be ...


Using Cyclical Components To Improve The Forecasts Of The Stock Market And Macroeconomic Variables, Kenneth R. Szulczyk, Shibley Sadique 2018 Curtin University Malaysia

Using Cyclical Components To Improve The Forecasts Of The Stock Market And Macroeconomic Variables, Kenneth R. Szulczyk, Shibley Sadique

Journal of Modern Applied Statistical Methods

Economic variables such as stock market indices, interest rates, and national output measures contain cyclical components. Forecasting methods excluding these cyclical components yield inaccurate out-of-sample forecasts. Accordingly, a three-stage procedure is developed to estimate a vector autoregression (VAR) with cyclical components. A Monte Carlo simulation shows the procedure estimates the parameters accurately. Subsequently, a VAR with cyclical components improves the root-mean-square error of out-of-sample forecasts by 50% for a stock market model with macroeconomic variables.


Decision Making In A Changing Environment, Alan Veliz-Cuba 2018 University of Dayton

Decision Making In A Changing Environment, Alan Veliz-Cuba

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, Carlos Cruz 2018 Illinois State University

Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, Carlos Cruz

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Understanding Sexual Violence Against Women, Maria Martinez 2018 Illinois State University

Understanding Sexual Violence Against Women, Maria Martinez

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Selected Adventures In Policy Modeling, Edward Kaplan 2018 Illinois State University

Selected Adventures In Policy Modeling, Edward Kaplan

Annual Symposium on Biomathematics and Ecology: Education and Research

No abstract provided.


Identifying Treatment Effects In The Presence Of Confounded Types, Desire Kedagni 2018 Iowa State University

Identifying Treatment Effects In The Presence Of Confounded Types, Desire Kedagni

Economics Working Papers

In this paper, I consider identification of treatment effects when
the treatment is endogenous. The use of instrumental variables is a popular
solution to deal with endogeneity, but this may give misleading answers when
the instrument is invalid. I show that when the instrument is invalid due to
correlation with the first stage unobserved heterogeneity, a second (also
possibly invalid) instrument allows to partially identify not only the local
average treatment effect but also the entire potential outcomes distributions
for compliers. I exploit the fact that the distribution of the observed
outcome in each group defined by the treatment and ...


Statistical Modeling Of Co2 Flux Data, Fang He 2018 The University of Western Ontario

Statistical Modeling Of Co2 Flux Data, Fang He

Electronic Thesis and Dissertation Repository

Carbon dioxide (CO2) flux is important for agriculture and carbon cycle studies. Only a small proportion of the land is currently covered by proper equipment to directly collect CO2 flux data. The CO2 flux data has an obvious annual cycle with the phase changing from year to year. How to build a model to estimate the annual effect and seasonal dynamics is a challenging task. With the help of the Moderate Resolution Imaging Spectroradiometer (MODIS) which is carried by NASA satellites, corresponding data, such as normalized difference vegetation index (NDVI), is freely available from NASA. Our goals are modeling the ...


Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, Yuejia Xu, Oliver B. Laeyendecker, Rui Wang 2018 University of Cambridge

Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, Yuejia Xu, Oliver B. Laeyendecker, Rui Wang

Harvard University Biostatistics Working Paper Series

No abstract provided.


Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling 2018 National Cancer Institute

Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling

Journal of Modern Applied Statistical Methods

Two multiple imputation methods, the Sequential Regression Multivariate Imputation Algorithm and the Cox-Lannacchione Weighted Sequential Hotdeck, were examined and compared to impute highly missing categorical variables from the Family Life, Activity, Sun, Health and Eating (FLASHE) study. This paper describes the imputation approaches and results from the study.


Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma 2018 University of Havana

Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma

Journal of Modern Applied Statistical Methods

The use of randomized response procedures allows diminishing the number of non-responses and increasing the accuracy of the responses. A new sampling strategy is developed where the reports are scrambled using the procedure of Gupta and Thornton. The estimator of the mean as well as the errors are developed for the Rao-Hartley-Cochran and Ranked Sets Sampling designs. The proposals are compared with the original model based on the use of simple random sampling.


Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora 2018 Panjab University, Chandigarh, India

Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora

Journal of Modern Applied Statistical Methods

Bayesian and semi-Bayesian estimators of parameters of the generalized inverse Weibull distribution are obtained using Jeffreys’ prior and informative prior under specific assumptions of loss function. Using simulation, the relative efficiency of the proposed estimators is obtained under different set-ups. A real life example is also given.


Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl 2018 Southern Methodist University

Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl

SMU Data Science Review

In this paper, we present a heart disease prediction use case showing how synthetic data can be used to address privacy concerns and overcome constraints inherent in small medical research data sets. While advanced machine learning algorithms, such as neural networks models, can be implemented to improve prediction accuracy, these require very large data sets which are often not available in medical or clinical research. We examine the use of surrogate data sets comprised of synthetic observations for modeling heart disease prediction. We generate surrogate data, based on the characteristics of original observations, and compare prediction accuracy results achieved from ...


Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler 2018 Southern Methodist University

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

SMU Data Science Review

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool ...


Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi 2018 Southern Methodist University

Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi

SMU Data Science Review

In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.


Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John 2018 Southern Methodist University

Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John

SMU Data Science Review

In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age ...


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels 2018 Southern Methodist University

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...


Digital Commons powered by bepress