Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 25 of 25

Full-Text Articles in Physical Sciences and Mathematics

Big Ideas In Sports Analytics And Statistical Tools For Their Investigation, Benjamin S. Baumer, Gregory J. Matthews, Quang Nguyen May 2023

Big Ideas In Sports Analytics And Statistical Tools For Their Investigation, Benjamin S. Baumer, Gregory J. Matthews, Quang Nguyen

Statistical and Data Sciences: Faculty Publications

Sports analytics—broadly defined as the pursuit of improvement in athletic performance through the analysis of data—has expanded its footprint both in the professional sports industry and in academia over the past 30 years. In this article, we connect four big ideas that are common across multiple sports: the expected value of a game state, win probability, measures of team strength, and the use of sports betting market data. For each, we explore both the shared similarities and individual idiosyncracies of analytical approaches in each sport. While our focus is on the concepts underlying each type of analysis, any implementation necessarily …


Data Science Transfer Pathways From Associate's To Bachelor's Programs, Benjamin S. Baumer, Nicholas J. Horton Jan 2023

Data Science Transfer Pathways From Associate's To Bachelor's Programs, Benjamin S. Baumer, Nicholas J. Horton

Statistical and Data Sciences: Faculty Publications

A substantial fraction of students who complete their college education at a public university in the United States begin their journey at one of the 935 public 2-year colleges. While the number of 4-year colleges offering bachelor’s degrees in data science continues to increase, data science instruction at many 2-year colleges lags behind. A major impediment is the relative paucity of introductory data science courses that serve multiple student audiences and can easily transfer. In addition, the lack of predefined transfer pathways (or articulation agreements) for data science creates a growing disconnect that leaves students who want to study data …


Evaluation Of Edison's Data Science Competency Framework Through A Comparative Literature Analysis, Karl R. B. Schmitt, Linda Clark, Katherine M. Kinnaird, Ruth E. H. Wertz, Björn Sandstede Jan 2023

Evaluation Of Edison's Data Science Competency Framework Through A Comparative Literature Analysis, Karl R. B. Schmitt, Linda Clark, Katherine M. Kinnaird, Ruth E. H. Wertz, Björn Sandstede

Statistical and Data Sciences: Faculty Publications

During the emergence of Data Science as a distinct discipline, discussions of what exactly constitutes Data Science have been a source of contention, with no clear resolution. These disagreements have been exacerbated by the lack of a clear single disciplinary 'parent.' Many early efforts at defining curricula and courses exist, with the EDISON Project's Data Science Framework (EDISON-DSF) from the European Union being the most complete. The EDISON-DSF includes both a Data Science Body of Knowledge (DS-BoK) and Competency Framework (CF-DS). This paper takes a critical look at how EDISON's CF-DS compares to recent work and other published curricular or …


Implementing Github Actions Continuous Integration To Reduce Error Rates In Ecological Data Collection, Albert Y. Kim, Valentine Herrmann, Ross Barreto, Brianna Calkins, Erika Gonzalez-Akre, Daniel J. Johnson, Jennifer A. Jordan, Lukas Magee, Ian R. Mcgregor, Nicolle Montero, Karl Novak, Teagan Rogers, Jessica Shue, Kristina J. Anderson-Teixeira Sep 2022

Implementing Github Actions Continuous Integration To Reduce Error Rates In Ecological Data Collection, Albert Y. Kim, Valentine Herrmann, Ross Barreto, Brianna Calkins, Erika Gonzalez-Akre, Daniel J. Johnson, Jennifer A. Jordan, Lukas Magee, Ian R. Mcgregor, Nicolle Montero, Karl Novak, Teagan Rogers, Jessica Shue, Kristina J. Anderson-Teixeira

Statistical and Data Sciences: Faculty Publications

Accurate field data are essential to understanding ecological systems and forecasting their responses to global change. Yet, data collection errors are common, and data analysis often lags far enough behind its collection that many errors can no longer be corrected, nor can anomalous observations be revisited. Needed is a system in which data quality assurance and control (QA/QC), along with the production of basic data summaries, can be automated immediately following data collection.

Here, we implement and test a system to satisfy these needs. For two annual tree mortality censuses and a dendrometer band survey at two forest research sites, …


An Educator’S Perspective Of The Tidyverse, Mine Çetinkaya-Rundel, Johanna Hardin, Benjamin Baumer, Amelia Mcnamara, Nicholas J. Horton, Colin W. Rundel Apr 2022

An Educator’S Perspective Of The Tidyverse, Mine Çetinkaya-Rundel, Johanna Hardin, Benjamin Baumer, Amelia Mcnamara, Nicholas J. Horton, Colin W. Rundel

Statistical and Data Sciences: Faculty Publications

Computing makes up a large and growing component of data science and statistics courses. Many of those courses, especially when taught by faculty who are statisticians by training, teach R as the programming language. A number of instructors have opted to build much of their teaching around use of the tidyverse. The tidyverse, in the words of its developers, “is a collection of R packages that share a high-level design philosophy and low-level grammar and data structures, so that learning one package makes it easier to learn the next” (Wickham et al. 2019). These shared principles have led to the …


Comparison Of Caregiver- And Child-Reported Quality Of Life In Children With Sleep-Disordered Breathing, Phoebe Kuo Yu, Kaitlyn Cook, Jiayan Liu, Raouf S. Amin, Craig Derkay, Lisa M. Elden, Susan L. Garetz, Alisha S. George, Sally Ibrahim, Stacey L. Ishman, Erin M. Kirkham, S. Kamal Naqvi, Jerilynn Radcliffe, Kristie R. Ross, Gopi B. Shah, Ignacio E. Tapia, H. Gerry Taylor, David A. Zopf, Susan Redline, Cristina M. Baldassari Jan 2022

Comparison Of Caregiver- And Child-Reported Quality Of Life In Children With Sleep-Disordered Breathing, Phoebe Kuo Yu, Kaitlyn Cook, Jiayan Liu, Raouf S. Amin, Craig Derkay, Lisa M. Elden, Susan L. Garetz, Alisha S. George, Sally Ibrahim, Stacey L. Ishman, Erin M. Kirkham, S. Kamal Naqvi, Jerilynn Radcliffe, Kristie R. Ross, Gopi B. Shah, Ignacio E. Tapia, H. Gerry Taylor, David A. Zopf, Susan Redline, Cristina M. Baldassari

Statistical and Data Sciences: Faculty Publications

Objective. Caregivers frequently report poor quality of life(QOL) in children with sleep-disordered breathing (SDB).Our objective is to assess the correlation between care-giver- and child-reported QOL in children with mild SDBand identify factors associated with differences between caregiver and child report.

Study Design. Analysis of baseline data from a multi-institutional randomized trialSetting. Pediatric Adenotonsillectomy Trial for Snoring, where children with mild SDB (obstructive apnea-hypopnea index\3) were randomized to observation or adenotonsillectomy.

Methods. The Pediatric Quality of Life Inventory (Peds QL)assessed baseline global QOL in participating children 5 to12 years old and their caregivers. Caregiver and child scores were compared. Multivariable regression …


The Forestecology R Package For Fitting And Assessing Neighborhood Models Of The Effect Of Interspecific Competition On The Growth Of Trees, Albert Y. Kim, David N. Allen, Simon P. Couch Nov 2021

The Forestecology R Package For Fitting And Assessing Neighborhood Models Of The Effect Of Interspecific Competition On The Growth Of Trees, Albert Y. Kim, David N. Allen, Simon P. Couch

Statistical and Data Sciences: Faculty Publications

Neighborhood competition models are powerful tools to measure the effect of interspecific competition. Statistical methods to ease the application of these models are currently lacking. We present the forestecology package providing methods to (a) specify neighborhood competition models, (b) evaluate the effect of competitor species identity using permutation tests, and (cs) measure model performance using spatial cross-validation. Following Allen and Kim (PLoS One, 15, 2020, e0229930), we implement a Bayesian linear regression neighborhood competition model. We demonstrate the package's functionality using data from the Smithsonian Conservation Biology Institute's large forest dynamics plot, part of the ForestGEO global network of research …


Facilitating Team-Based Data Science: Lessons Learned From The Dsc-Wav Project, Chelsey Legacy, Andrew Zieffler, Benjamin S. Baumer, Valerie Barr, Nicholas J. Horton Oct 2021

Facilitating Team-Based Data Science: Lessons Learned From The Dsc-Wav Project, Chelsey Legacy, Andrew Zieffler, Benjamin S. Baumer, Valerie Barr, Nicholas J. Horton

Statistical and Data Sciences: Faculty Publications

While coursework provides undergraduate data science students with some relevant analytic skills, many are not given the rich experiences with data and computing they need to be successful in the workplace. Additionally, students often have limited exposure to team-based data science and the principles and tools of collaboration that are encountered outside of school. In this paper, we describe the DSC-WAV program, an NSF-funded data science workforce development project in which teams of undergraduate sophomores and juniors work with a local non-profit organization on a data-focused problem. To help students develop a sense of agency and improve confidence in their …


Infer: An R Package For Tidyverse-Friendly Statistical Inference, Simon P. Couch, Andrew P. Bray, Chester Ismay, Evgeni Chasnovski, B. Baumer, Mine Cetinkaya-Rundel Sep 2021

Infer: An R Package For Tidyverse-Friendly Statistical Inference, Simon P. Couch, Andrew P. Bray, Chester Ismay, Evgeni Chasnovski, B. Baumer, Mine Cetinkaya-Rundel

Statistical and Data Sciences: Faculty Publications

infer implements an expressive grammar to perform statistical inference that adheres to the tidyverse design framework (Wickham et al., 2019). Rather than providing methods for specific statistical tests, this package consolidates the principles that are shared among common hypothesis tests and confidence intervals into a set of four main verbs (functions), supplemented with many utilities to visualize and extract value from their outputs.


Automatic Hierarchy Expansion For Improved Structure And Chord Evaluation, Katherine M. Kinnaird, Brian Mcfee Jan 2021

Automatic Hierarchy Expansion For Improved Structure And Chord Evaluation, Katherine M. Kinnaird, Brian Mcfee

Statistical and Data Sciences: Faculty Publications

No abstract provided.


The Data Science Corps Wrangle-Analyze- Visualize Program: Building Data Acumen For Undergraduate Students, Nicholas J. Horton, Benjamin Baumer, Andrew Zieffler, Valerie Barr Jan 2021

The Data Science Corps Wrangle-Analyze- Visualize Program: Building Data Acumen For Undergraduate Students, Nicholas J. Horton, Benjamin Baumer, Andrew Zieffler, Valerie Barr

Statistical and Data Sciences: Faculty Publications

We congratulate Kolaczyk, Wright, and Yajima on their innovative statistics practicum that places “practice” at the center of data science education (Kolaczyk et al., 2021, this issue). Their year-long practicum course focuses on the data science life cycle with engagement with external partners and university consulting projects. We agree that training postgraduates in practice needs to be foregrounded in the curriculum in order for students to develop necessary depth in data science practice.


Creating Optimal Conditions For Reproducible Data Analysis In R With ‘Fertile’, Audrey M. Bertin, Benjamin Baumer Nov 2020

Creating Optimal Conditions For Reproducible Data Analysis In R With ‘Fertile’, Audrey M. Bertin, Benjamin Baumer

Statistical and Data Sciences: Faculty Publications

The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation and no clear consensus on standards of what constitutes reproducibility in published research. We present fertile, an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment. fertile operates in two modes: proactively, to prevent reproducibility mistakes from happening …


Teaching Computational Machine Learning (Without Statistics), Katherine M. Kinnaird Sep 2020

Teaching Computational Machine Learning (Without Statistics), Katherine M. Kinnaird

Statistical and Data Sciences: Faculty Publications

This paper presents an undergraduate machine learning course that emphasizes algorithmic understanding and programming skills while assuming no statistical training. Emphasizing the development of good habits of mind, this course trains students to be independent machine learning practitioners through an iterative, cyclical framework for teaching concepts while adding increasing depth and nuance. Beginning with unsupervised learning, this course is sequenced as a series of machine learning ideas and concepts with specific algorithms acting as concrete examples. This paper also details course organization including evaluation practices and logistics.


The Influence Of Peer And Parental Norms On First-Generation College Students’ Binge Drinking Trajectories, Graham T. Diguiseppi, Jordan P. Davis, Matthew K. Meisel, Melissa A. Clark, Mya L. Roberson, Miles Q. Ott, Nancy P. Barnett Apr 2020

The Influence Of Peer And Parental Norms On First-Generation College Students’ Binge Drinking Trajectories, Graham T. Diguiseppi, Jordan P. Davis, Matthew K. Meisel, Melissa A. Clark, Mya L. Roberson, Miles Q. Ott, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

Introduction: First-generation college students are those whose parents have not completed a four-year college degree. The current study addressed the lack of research on first-generation college students’ alcohol use by comparing the binge drinking trajectories of first-generation and continuing-generation students over their first three semesters. The dynamic influence of peer and parental social norms on students’ binge drinking frequencies were also examined. Methods: 1342 college students (n = 225 first-generation) at one private University completed online surveys. Group differences were examined at Time 1, and latent growth-curve models tested the association between first-generation status and social norms (peer descriptive, peer …


Identification And Description Of Potentially Influential Social Network Members Using The Strategic Player Approach, Miles Q. Ott, Sara G. Balestrieri, Graham Diguiseppi, Melissa A. Clark, Michael Bernstein, Sarah Helseth, Nancy P. Barnett Mar 2020

Identification And Description Of Potentially Influential Social Network Members Using The Strategic Player Approach, Miles Q. Ott, Sara G. Balestrieri, Graham Diguiseppi, Melissa A. Clark, Michael Bernstein, Sarah Helseth, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

Background: Diffusion of innovations theory posits that ideas and behaviors can be spread through social network ties. In intervention work, intervening upon certain network members may lead to intervention effects “diffusing” into the network to affect the behavior of network members who did not receive the intervention. The strategic players (SP) method, an extension of Borgatti’s Key Players approach, is used to balance the (sometimes) opposing goals of spreading the intervention to as many members of the target group as possible, while preventing the spread of the intervention to others. Objectives: We sought to test whether members of the SP …


Supp & Mapp: Adaptable Structure-Based Representations For Mir Tasks, Claire Savard, Erin H. Bugbee, Melissa R, Mcguirl, Katherine M. Kinnaird Jan 2020

Supp & Mapp: Adaptable Structure-Based Representations For Mir Tasks, Claire Savard, Erin H. Bugbee, Melissa R, Mcguirl, Katherine M. Kinnaird

Statistical and Data Sciences: Faculty Publications

Accurate and flexible representations of music data are paramount to addressing MIR tasks, yet many of the existing approaches are difficult to interpret or rigid in nature. This work introduces two new song representations for structure-based retrieval methods: Surface Pattern Preservation (SuPP), a continuous song representation, and Matrix Pattern Preservation (MaPP), SuPP’s discrete counterpart. These representations come equipped with several user-defined parameters so that they are adaptable for a range of MIR tasks. Experimental results show MaPP as successful in addressing the cover song task on a set of Mazurka scores, with a mean precision of 0.965 and recall of …


Do Misperceptions Of Peer Drinking Influence Personal Drinking Behavior? Results From A Complete Social Network Of First-Year College Students, Melissa J. Cox, Angelo M. Dibello, Matthew K. Meisel, Miles Q. Ott, Shannon R. Kenney, Melissa A. Clark, Nancy P. Barnett May 2019

Do Misperceptions Of Peer Drinking Influence Personal Drinking Behavior? Results From A Complete Social Network Of First-Year College Students, Melissa J. Cox, Angelo M. Dibello, Matthew K. Meisel, Miles Q. Ott, Shannon R. Kenney, Melissa A. Clark, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

This study considered the influence of misperceptions of typical versus self-identified important peers' heavy drinking on personal heavy drinking intentions and frequency utilizing data from a complete social network of college students. The study sample included data from 1,313 students (44% male, 57% White, 15% Hispanic/Latinx) collected during the fall and spring semesters of their freshman year. Students provided perceived heavy drinking frequency for a typical student peer and up to 10 identified important peers. Personal past-month heavy drinking frequency was assessed for all participants at both time points. By comparing actual with perceived heavy drinking frequencies, measures of misperceptions …


A Grammar For Reproducible And Painless Extract-Transform-Load Operations On Medium Data, Benjamin S. Baumer Apr 2019

A Grammar For Reproducible And Painless Extract-Transform-Load Operations On Medium Data, Benjamin S. Baumer

Statistical and Data Sciences: Faculty Publications

Many interesting datasets available on the Internet are of a medium size—too big to fit into a personal computer’s memory, but not so large that they would not fit comfortably on its hard disk. In the coming years, datasets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) …


Enrollment And Assessment Of A First-Year College Class Social Network For A Controlled Trial Of The Indirect Effect Of A Brief Motivational Intervention, Nancy P. Barnett, Melissa A. Clark, Shannon R. Kenney, Graham Diguiseppi, Matthew K. Meisel, Sara Balestrieri, Miles Q. Ott, John Light Jan 2019

Enrollment And Assessment Of A First-Year College Class Social Network For A Controlled Trial Of The Indirect Effect Of A Brief Motivational Intervention, Nancy P. Barnett, Melissa A. Clark, Shannon R. Kenney, Graham Diguiseppi, Matthew K. Meisel, Sara Balestrieri, Miles Q. Ott, John Light

Statistical and Data Sciences: Faculty Publications

Heavy drinking and its consequences among college students represent a serious public health problem, and peer social networks are a robust predictor of drinking-related risk behaviors. In a recent trial, we administered a Brief Motivational Intervention (BMI) to a small number of first-year college students to assess the indirect effects of the intervention on peers not receiving the intervention. Objectives: To present the research design, describe the methods used to successfully enroll a high proportion of a first-year college class network, and document participant characteristics. Methods: Prior to study enrollment, we consulted with a student advisory group and campus stakeholders …


Relationships Between Social Network Characteristics, Alcohol Use, And Alcohol-Related Consequences In A Large Network Of First-Year College Students: How Do Peer Drinking Norms Fit In?, Graham T. Diguiseppi, Matthew K. Meisel, Sara G. Balestrieri, Miles Q. Ott, Melissa A. Clark, Nancy P. Barnett Dec 2018

Relationships Between Social Network Characteristics, Alcohol Use, And Alcohol-Related Consequences In A Large Network Of First-Year College Students: How Do Peer Drinking Norms Fit In?, Graham T. Diguiseppi, Matthew K. Meisel, Sara G. Balestrieri, Miles Q. Ott, Melissa A. Clark, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

A burgeoning area of research is using social network analysis to investigate college students' substance use behaviors. However, little research has incorporated students' perceived peer drinking norms into these analyses. The present study investigated the association between social network characteristics, alcohol use, and alcohol-related consequences among first-year college students (N 1,342; 81% of the first-year class) at one university. The moderating role of descriptive norms was also examined. Network characteristics and descriptive norms were derived from participants' nominations of up to 10 other students who were important to them; individual network characteristics included popularity (indegree), network expansiveness (outdegree), relationship reciprocity, …


Strategic Players For Identifying Optimal Social Network Intervention Subjects, Miles Q. Ott, John M. Light, Melissa A. Clark, Nancy P. Barnett Oct 2018

Strategic Players For Identifying Optimal Social Network Intervention Subjects, Miles Q. Ott, John M. Light, Melissa A. Clark, Nancy P. Barnett

Statistical and Data Sciences: Faculty Publications

We present a method whereby social network ties are used to identify behavioral leaders who are situated in the network such that these individuals are: 1) able to influence other individuals who are in need of and most receptive to intervention, thereby optimizing the impact of the intervention; and 2) not embedded with ties to individuals that are likely to be behaviorally antagonistic to the intervention or that would compromise the optimal impact of intervention. In this study we developed a method that we call Strategic Players, which is a solution for identifying a set of players who are close …


Data Science In Statistics Curricula: Preparing Students To “Think With Data”, J. Hardin, R. Hoerl, Nicholas J. Horton, D. Nolan, B. Baumer, O. Hall-Holt, P. Murrell, R. Peng, P. Roback, D. Temple Lang, M. D. Ward Oct 2015

Data Science In Statistics Curricula: Preparing Students To “Think With Data”, J. Hardin, R. Hoerl, Nicholas J. Horton, D. Nolan, B. Baumer, O. Hall-Holt, P. Murrell, R. Peng, P. Roback, D. Temple Lang, M. D. Ward

Statistical and Data Sciences: Faculty Publications

A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to use databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this article is to motivate the importance of data science proficiency and to provide examples and …


As Strong As The Weakest Link: Mining Diverse Cliques In Weighted Graphs, Petko Bogdanov, Ben Baumer, Prithwish Basu, Amotz Bar-Noy, Ambuj K. Singh Oct 2013

As Strong As The Weakest Link: Mining Diverse Cliques In Weighted Graphs, Petko Bogdanov, Ben Baumer, Prithwish Basu, Amotz Bar-Noy, Ambuj K. Singh

Statistical and Data Sciences: Faculty Publications

Mining for cliques in networks provides an essential tool for the discovery of strong associations among entities. Applications vary, from extracting core subgroups in team performance data arising in sports, entertainment, research and business; to the discovery of functional complexes in high-throughput gene interaction data. A challenge in all of these scenarios is the large size of real-world networks and the computational complexity associated with clique enumeration. Furthermore, when mining for multiple cliques within the same network, the results need to be diversified in order to extract meaningful information that is both comprehensive and representative of the whole dataset. We …


Maximizing Network Lifetime On The Line With Adjustable Sensing Ranges, Amotz Bar-Noy, Ben Baumer Feb 2012

Maximizing Network Lifetime On The Line With Adjustable Sensing Ranges, Amotz Bar-Noy, Ben Baumer

Statistical and Data Sciences: Faculty Publications

Given n sensors on a line, each of which is equipped with a unit battery charge and an adjustable sensing radius, what schedule will maximize the lifetime of a network that covers the entire line? Trivially, any reasonable algorithm is at least a 1/2-approximation, but we prove tighter bounds for several natural algorithms. We focus on developing a linear time algorithm that maximizes the expected lifetime under a random uniform model of sensor distribution. We demonstrate one such algorithm that achieves an average-case approximation ratio of almost 0.9. Most of the algorithms that we consider come from a family based …


Parsing The Relationship Between Baserunning And Batting Abilities Within Lineups, Ben S. Baumer, James Piette, Brad Null Jan 2012

Parsing The Relationship Between Baserunning And Batting Abilities Within Lineups, Ben S. Baumer, James Piette, Brad Null

Statistical and Data Sciences: Faculty Publications

A baseball team's offensive prowess is a function of two types of abilities: batting and baserunning. While each has been studied extensively in isolation, the effects of their interaction is not well understood. We model offensive output as a scalar function f of an individual player's batting and baserunning profile z. Each of these profiles is in turn estimated from Retrosheet data using heirarchical Bayesian models. We then use the SimulOutCome simulation engine as a method to generate values of f(z) over a fine grid of points. Finally, for each of several methods of taking the extra base, we graphically …