Open Access. Powered by Scholars. Published by Universities.®

Econometrics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Econometrics

The Paradox Of Big Data, Gary N. Smith Jan 2019

The Paradox Of Big Data, Gary N. Smith

Pomona Economics

Data-mining is often used to discover patterns in Big Data. It is tempting believe that because an unearthed pattern is unusual it must be meaningful, but patterns are inevitable in Big Data and usually meaningless. The paradox of Big Data is that data mining is most seductive when there are a large number of variables, but a large number of variables exacerbates the perils of data mining.


Step Away From Stepwise, Gary N. Smith Jan 2019

Step Away From Stepwise, Gary N. Smith

Pomona Economics

Stepwise regression is a popular data-mining tool that uses statistical significance to select the explanatory variables to be used in a multiple-regression model. A fundamental problem with stepwise regression is that some real explanatory variables that have causal effects on the dependent variable may happen to not be statistically significant, while nuisance variables may be coincidentally significant. As a result, the model may fit the data well in-sample, but do poorly out-of-sample. Many Big-Data researchers believe that, the larger the number of possible explanatory variables, the more useful is stepwise regression for selecting explanatory variables. The reality is that stepwise …