Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Physical Sciences and Mathematics

Creating Optimal Conditions For Reproducible Data Analysis In R With ‘Fertile’, Audrey M. Bertin, Benjamin Baumer Nov 2020

Creating Optimal Conditions For Reproducible Data Analysis In R With ‘Fertile’, Audrey M. Bertin, Benjamin Baumer

Statistical and Data Sciences: Faculty Publications

The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation and no clear consensus on standards of what constitutes reproducibility in published research. We present fertile, an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment. fertile operates in two modes: proactively, to prevent reproducibility mistakes from happening …


Teaching Computational Machine Learning (Without Statistics), Katherine M. Kinnaird Sep 2020

Teaching Computational Machine Learning (Without Statistics), Katherine M. Kinnaird

Statistical and Data Sciences: Faculty Publications

This paper presents an undergraduate machine learning course that emphasizes algorithmic understanding and programming skills while assuming no statistical training. Emphasizing the development of good habits of mind, this course trains students to be independent machine learning practitioners through an iterative, cyclical framework for teaching concepts while adding increasing depth and nuance. Beginning with unsupervised learning, this course is sequenced as a series of machine learning ideas and concepts with specific algorithms acting as concrete examples. This paper also details course organization including evaluation practices and logistics.


“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin Aug 2020

“Playing The Whole Game”: A Data Collection And Analysis Exercise With Google Calendar, Albert Y. Kim, Johanna Hardin

Statistical and Data Sciences: Faculty Publications

We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to “play the whole game” of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question “How do I spend my time?” On the one hand, the exercise involves answering a question with near universal appeal, but …


Integrating Data Science Ethics Into An Undergraduate Major, Benjamin Baumer, Randi L. Garcia, Albert Y. Kim, Katherine M. Kinnaird, Miles Q. Ott Jul 2020

Integrating Data Science Ethics Into An Undergraduate Major, Benjamin Baumer, Randi L. Garcia, Albert Y. Kim, Katherine M. Kinnaird, Miles Q. Ott

Statistical and Data Sciences: Faculty Publications

We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for weaving ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. We also provide six examples of data science ethics modules used in five different courses at our liberal arts college, each focusing on a different ethical consideration. The modules are designed to be portable such that they can be flexibly incorporated …


Circada: Shiny Apps For Exploration Of Experimental And Synthetic Circadian Time Series With An Educational Emphasis, Lisa Cenek, Liubou Klindziuk, Cindy Lopez, Eleanor Mccartney, Blanca Martin Burgos, Selma Tir, Mary E. Harrington, Tanya L. Leise Apr 2020

Circada: Shiny Apps For Exploration Of Experimental And Synthetic Circadian Time Series With An Educational Emphasis, Lisa Cenek, Liubou Klindziuk, Cindy Lopez, Eleanor Mccartney, Blanca Martin Burgos, Selma Tir, Mary E. Harrington, Tanya L. Leise

Psychology: Faculty Publications

Circadian rhythms are daily oscillations in physiology and behavior that can be assessed by recording body temperature, locomotor activity, or bioluminescent reporters, among other measures. These different types of data can vary greatly in waveform, noise characteristics, typical sampling rate, and length of recording. We developed 2 Shiny apps for exploration of these data, enabling visualization and analysis of circadian parameters such as period and phase. Methods include the discrete wavelet transform, sine fitting, the Lomb-Scargle periodogram, autocorrelation, and maximum entropy spectral analysis, giving a sense of how well each method works on each type of data. The apps also …


The Influence Of Peer And Parental Norms On First-Generation College Students’ Binge Drinking Trajectories, Graham T. Diguiseppi, Jordan P. Davis, Matthew K. Meisel, Melissa A. Clark, Mya L. Roberson, Miles Q. Ott, Nancy P. Barnett Apr 2020

The Influence Of Peer And Parental Norms On First-Generation College Students’ Binge Drinking Trajectories, Graham T. Diguiseppi, Jordan P. Davis, Matthew K. Meisel, Melissa A. Clark, Mya L. Roberson, Miles Q. Ott, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

Introduction: First-generation college students are those whose parents have not completed a four-year college degree. The current study addressed the lack of research on first-generation college students’ alcohol use by comparing the binge drinking trajectories of first-generation and continuing-generation students over their first three semesters. The dynamic influence of peer and parental social norms on students’ binge drinking frequencies were also examined. Methods: 1342 college students (n = 225 first-generation) at one private University completed online surveys. Group differences were examined at Time 1, and latent growth-curve models tested the association between first-generation status and social norms (peer descriptive, peer …


A Permutation Test And Spatial Cross-Validation Approach To Assess Models Of Interspecific Competition Between Trees, David Allen, Albert Y. Kim Mar 2020

A Permutation Test And Spatial Cross-Validation Approach To Assess Models Of Interspecific Competition Between Trees, David Allen, Albert Y. Kim

Statistical and Data Sciences: Faculty Publications

Measuring species-specific competitive interactions is key to understanding plant communities. Repeat censused large forest dynamics plots offer an ideal setting to measure these interactions by estimating the species-specific competitive effect on neighboring tree growth. Estimating these interaction values can be difficult, however, because the number of them grows with the square of the number of species. Furthermore, confidence in the estimates can be overestimated if any spatial structure of model errors is not considered. Here we measured these interactions in a forest dynamics plot in a transitional oak-hickory forest. We analytically fit Bayesian linear regression models of annual tree radial …


Identification And Description Of Potentially Influential Social Network Members Using The Strategic Player Approach, Miles Q. Ott, Sara G. Balestrieri, Graham Diguiseppi, Melissa A. Clark, Michael Bernstein, Sarah Helseth, Nancy P. Barnett Mar 2020

Identification And Description Of Potentially Influential Social Network Members Using The Strategic Player Approach, Miles Q. Ott, Sara G. Balestrieri, Graham Diguiseppi, Melissa A. Clark, Michael Bernstein, Sarah Helseth, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

Background: Diffusion of innovations theory posits that ideas and behaviors can be spread through social network ties. In intervention work, intervening upon certain network members may lead to intervention effects “diffusing” into the network to affect the behavior of network members who did not receive the intervention. The strategic players (SP) method, an extension of Borgatti’s Key Players approach, is used to balance the (sometimes) opposing goals of spreading the intervention to as many members of the target group as possible, while preventing the spread of the intervention to others. Objectives: We sought to test whether members of the SP …


Teaching Introductory Statistics With Datacamp, Benjamin Baumer, Andrew P. Bray, Mine Çetinkaya-Rundel, Johanna S. Hardin Jan 2020

Teaching Introductory Statistics With Datacamp, Benjamin Baumer, Andrew P. Bray, Mine Çetinkaya-Rundel, Johanna S. Hardin

Statistical and Data Sciences: Faculty Publications

We designed a sequence of courses for the DataCamp online learning platform that approximates the content of a typical introductory statistics course. We discuss the design and implementation of these courses and illustrate how they can be successfully integrated into a brick-and-mortar class. We reflect on the process of creating content for online consumers, ruminate on the pedagogical considerations we faced, and describe an R package for statistical inference that became a by-product of this development process. We discuss the pros and cons of creating the course sequence and express our view that some aspects were particularly problematic. The issues …


Supp & Mapp: Adaptable Structure-Based Representations For Mir Tasks, Claire Savard, Erin H. Bugbee, Melissa R, Mcguirl, Katherine M. Kinnaird Jan 2020

Supp & Mapp: Adaptable Structure-Based Representations For Mir Tasks, Claire Savard, Erin H. Bugbee, Melissa R, Mcguirl, Katherine M. Kinnaird

Statistical and Data Sciences: Faculty Publications

Accurate and flexible representations of music data are paramount to addressing MIR tasks, yet many of the existing approaches are difficult to interpret or rigid in nature. This work introduces two new song representations for structure-based retrieval methods: Surface Pattern Preservation (SuPP), a continuous song representation, and Matrix Pattern Preservation (MaPP), SuPP’s discrete counterpart. These representations come equipped with several user-defined parameters so that they are adaptable for a range of MIR tasks. Experimental results show MaPP as successful in addressing the cover song task on a set of Mazurka scores, with a mean precision of 0.965 and recall of …