Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical and Data Sciences: Faculty Publications

Statistical computing

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Statistics and Probability

An Educator’S Perspective Of The Tidyverse, Mine Çetinkaya-Rundel, Johanna Hardin, Benjamin Baumer, Amelia Mcnamara, Nicholas J. Horton, Colin W. Rundel Apr 2022

An Educator’S Perspective Of The Tidyverse, Mine Çetinkaya-Rundel, Johanna Hardin, Benjamin Baumer, Amelia Mcnamara, Nicholas J. Horton, Colin W. Rundel

Statistical and Data Sciences: Faculty Publications

Computing makes up a large and growing component of data science and statistics courses. Many of those courses, especially when taught by faculty who are statisticians by training, teach R as the programming language. A number of instructors have opted to build much of their teaching around use of the tidyverse. The tidyverse, in the words of its developers, “is a collection of R packages that share a high-level design philosophy and low-level grammar and data structures, so that learning one package makes it easier to learn the next” (Wickham et al. 2019). These shared principles have led to the …


A Grammar For Reproducible And Painless Extract-Transform-Load Operations On Medium Data, Benjamin S. Baumer Apr 2019

A Grammar For Reproducible And Painless Extract-Transform-Load Operations On Medium Data, Benjamin S. Baumer

Statistical and Data Sciences: Faculty Publications

Many interesting datasets available on the Internet are of a medium size—too big to fit into a personal computer’s memory, but not so large that they would not fit comfortably on its hard disk. In the coming years, datasets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) …