Of Rats And Men, 2018 Cuny Graduate School of Journalism

#### Of Rats And Men, Thomas S. Walsh

*Capstones*

This capstone is a data-driven investigation into New York City's rat problem. By using publicly available government data to map rat activity in NYC, I identified several socio-economic variables that correlate with rat populations at the community district, borough, and city-scale. I used these findings (mainly that rat problems are linked to lower incomes) as the basis of an investigation, which includes interviews with residents, experts, and city officials. Prof. Bobby Corrigan, urban rodentologist and formerly with the NYC Department of Health criticizes the city's efforts for the first time on the record.

https://thomasseiyawalsh.wixsite.com/ratstone

Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, 2018 University of Windsor

#### Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, Yujie Song

*Major Papers*

In this major paper, we study the influence of structural breaks in the financial market model with high-dimensional data. We present a model which is capable of detecting changes in factor loadings, determining the number of factors and detecting the break date. We consider the case where the break date is both known and unknown and identify the type of instability. For the unknown break date case, we propose a group-LASSO estimator to determine the number of pre- and post-break factors, the break date and the existence of instability of factor loadings when the number of factor is constant. We ...

Estimation In High-Dimensional Factor Models With Structural Instabilities, 2018 University of Windsor

#### Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao

*Major Papers*

In this major paper, we use high-dimensional models to analyze macroeconomic data which is in influenced by the break point. In particular, we consider to detect the break point and study the changes of the number of factors and the factor loadings with the structural instability.

Concretely, we propose two factor models which explain the processes of pre- and post- break periods. Then, we consider the break point as known or unknown. In both situations, we derive the shrinkage estimators by minimizing the penalized least square function and calculate the estimators of the numbers of pre- and post- break factors ...

On Projection Of A Positive Definite Matrix On A Cone Of Nonnegative Definite Toeplitz Matrices, 2018 Poznań University Of Technology

#### On Projection Of A Positive Definite Matrix On A Cone Of Nonnegative Definite Toeplitz Matrices, Katarzyna Filipiak, Augustyn Markiewicz, Adam Mieldzioc, Aneta Sawikowska

*Electronic Journal of Linear Algebra*

We consider approximation of a given positive definite matrix by nonnegative definite banded Toeplitz matrices. We show that the projection on linear space of Toeplitz matrices does not always preserve nonnegative definiteness. Therefore we characterize a convex cone of nonnegative definite banded Toeplitz matrices which depends on the matrix dimensions, and we show that the condition of positive definiteness given by Parter [{\em Numer. Math. 4}, 293--295, 1962] characterizes the asymptotic cone. In this paper we give methodology and numerical algorithm of the projection basing on the properties of a cone of nonnegative definite Toeplitz matrices. This problem can be ...

Using Cyclical Components To Improve The Forecasts Of The Stock Market And Macroeconomic Variables, 2018 Curtin University Malaysia

#### Using Cyclical Components To Improve The Forecasts Of The Stock Market And Macroeconomic Variables, Kenneth R. Szulczyk, Shibley Sadique

*Journal of Modern Applied Statistical Methods*

Economic variables such as stock market indices, interest rates, and national output measures contain cyclical components. Forecasting methods excluding these cyclical components yield inaccurate out-of-sample forecasts. Accordingly, a three-stage procedure is developed to estimate a vector autoregression (VAR) with cyclical components. A Monte Carlo simulation shows the procedure estimates the parameters accurately. Subsequently, a VAR with cyclical components improves the root-mean-square error of out-of-sample forecasts by 50% for a stock market model with macroeconomic variables.

Decision Making In A Changing Environment, 2018 University of Dayton

#### Decision Making In A Changing Environment, Alan Veliz-Cuba

*Annual Symposium on Biomathematics and Ecology: Education and Research*

No abstract provided.

Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, 2018 Illinois State University

#### Snakebite Dynamics Of Colombia: Effects Of Precipitation Seasonality Of Incidence, Carlos Cruz

*Annual Symposium on Biomathematics and Ecology: Education and Research*

No abstract provided.

Understanding Sexual Violence Against Women, 2018 Illinois State University

#### Understanding Sexual Violence Against Women, Maria Martinez

*Annual Symposium on Biomathematics and Ecology: Education and Research*

No abstract provided.

Selected Adventures In Policy Modeling, 2018 Illinois State University

#### Selected Adventures In Policy Modeling, Edward Kaplan

*Annual Symposium on Biomathematics and Ecology: Education and Research*

No abstract provided.

Identifying Treatment Effects In The Presence Of Confounded Types, 2018 Iowa State University

#### Identifying Treatment Effects In The Presence Of Confounded Types, Desire Kedagni

*Economics Working Papers*

In this paper, I consider identification of treatment effects when

the treatment is endogenous. The use of instrumental variables is a popular

solution to deal with endogeneity, but this may give misleading answers when

the instrument is invalid. I show that when the instrument is invalid due to

correlation with the first stage unobserved heterogeneity, a second (also

possibly invalid) instrument allows to partially identify not only the local

average treatment effect but also the entire potential outcomes distributions

for compliers. I exploit the fact that the distribution of the observed

outcome in each group defined by the treatment and ...

Statistical Modeling Of Co2 Flux Data, 2018 The University of Western Ontario

#### Statistical Modeling Of Co2 Flux Data, Fang He

*Electronic Thesis and Dissertation Repository*

Carbon dioxide (CO2) flux is important for agriculture and carbon cycle studies. Only a small proportion of the land is currently covered by proper equipment to directly collect CO2 flux data. The CO2 flux data has an obvious annual cycle with the phase changing from year to year. How to build a model to estimate the annual effect and seasonal dynamics is a challenging task. With the help of the Moderate Resolution Imaging Spectroradiometer (MODIS) which is carried by NASA satellites, corresponding data, such as normalized difference vegetation index (NDVI), is freely available from NASA. Our goals are modeling the ...

Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, 2018 University of Cambridge

#### Cross-Sectional Hiv Incidence Estimation Accounting For Heterogeneity Across Communities, Yuejia Xu, Oliver B. Laeyendecker, Rui Wang

*Harvard University Biostatistics Working Paper Series*

No abstract provided.

Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, 2018 National Cancer Institute

#### Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling

*Journal of Modern Applied Statistical Methods*

Two multiple imputation methods, the Sequential Regression Multivariate Imputation Algorithm and the Cox-Lannacchione Weighted Sequential Hotdeck, were examined and compared to impute highly missing categorical variables from the Family Life, Activity, Sun, Health and Eating (FLASHE) study. This paper describes the imputation approaches and results from the study.

Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, 2018 University of Havana

#### Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma

*Journal of Modern Applied Statistical Methods*

The use of randomized response procedures allows diminishing the number of non-responses and increasing the accuracy of the responses. A new sampling strategy is developed where the reports are scrambled using the procedure of Gupta and Thornton. The estimator of the mean as well as the errors are developed for the Rao-Hartley-Cochran and Ranked Sets Sampling designs. The proposals are compared with the original model based on the use of simple random sampling.

Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, 2018 Panjab University, Chandigarh, India

#### Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora

*Journal of Modern Applied Statistical Methods*

Bayesian and semi-Bayesian estimators of parameters of the generalized inverse Weibull distribution are obtained using Jeffreys’ prior and informative prior under specific assumptions of loss function. Using simulation, the relative efficiency of the proposed estimators is obtained under different set-ups. A real life example is also given.

Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, 2018 Southern Methodist University

#### Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl

*SMU Data Science Review*

In this paper, we present a heart disease prediction use case showing how synthetic data can be used to address privacy concerns and overcome constraints inherent in small medical research data sets. While advanced machine learning algorithms, such as neural networks models, can be implemented to improve prediction accuracy, these require very large data sets which are often not available in medical or clinical research. We examine the use of surrogate data sets comprised of synthetic observations for modeling heart disease prediction. We generate surrogate data, based on the characteristics of original observations, and compare prediction accuracy results achieved from ...

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, 2018 Southern Methodist University

#### Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

*SMU Data Science Review*

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool ...

Predicting National Basketball Association Success: A Machine Learning Approach, 2018 Southern Methodist University

#### Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi

*SMU Data Science Review*

In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.

Minimizing The Perceived Financial Burden Due To Cancer, 2018 Southern Methodist University

#### Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John

*SMU Data Science Review*

In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age ...

Yelp’S Review Filtering Algorithm, 2018 Southern Methodist University

#### Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

*SMU Data Science Review*

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as *recommended* or *non-recommended* affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...