Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data

Statistics and Probability

Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 37

Full-Text Articles in Physical Sciences and Mathematics

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms Mar 2024

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms

Rural Health Clinics

Researchers at the Maine Rural Health Research Center describe a methodology for identifying Rural Health Clinic encounters within the Medicaid claims data using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files.

Background: There is limited information on the extent to which Rural Health Clinics (RHC) provide pediatric and pregnancy-related services to individuals enrolled in state Medicaid/CHIP programs. In part this is because methods to identify RHC encounters within Medicaid claims data are outdated.

Methods: We used a 100% sample of the 2018 Medicaid Demographic and Eligibility and Other Services Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files for 20 states …


Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy Aug 2023

Using Geographic Information To Explore Player-Specific Movement And Its Effects On Play Success In The Nfl, Hayley Horn, Eric Laigaie, Alexander Lopez, Shravan Reddy

SMU Data Science Review

American Football is a billion-dollar industry in the United States. The analytical aspect of the sport is an ever-growing domain, with open-source competitions like the NFL Big Data Bowl accelerating this growth. With the amount of player movement during each play, tracking data can prove valuable in many areas of football analytics. While concussion detection, catch recognition, and completion percentage prediction are all existing use cases for this data, player-specific movement attributes, such as speed and agility, may be helpful in predicting play success. This research calculates player-specific speed and agility attributes from tracking data and supplements them with descriptive …


A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni Jul 2023

A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni

Theses and Dissertations

Scan statistics are useful methods for detecting spatial clustering. While they were initially developed to detect regions with an excess of binomial or Poisson events, spatial scan statistics have been extended to detect hotspots in other types of data including continuous data. They have many applications in different fields such as epidemiology (e.g. detecting disease outbreaks), sociology (e.g. detecting crime hotspots), and environmental health (e.g. detecting high-pollution areas). Spatial scan statistics identify a ‘most likely cluster’ and then use a likelihood ratio test to determine if this cluster is statistically significant. Spatial scan statistics have been extended to the Bayesian …


Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan May 2023

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan

Computer Science Senior Theses

We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …


Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change, Jonathan Sharpe May 2023

Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change, Jonathan Sharpe

Honors Projects

The game of baseball has changed a lot in the past twenty years. It can be primarily attributed to the explosion in data analytics and how they are used to evaluate baseball players. This led to different player profiles being preferred and eventually led to the development of players changing. As a result, the strategies employed have also evolved and turned into a different game than seen only a couple of decades ago. This paper will explore the changes that the game has seen. On the other hand, Major League Baseball has also implemented its own changes to try and …


How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar May 2022

How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar

Information Systems Undergraduate Honors Theses

Since the founding of computers, data scientists have been able to engineer devices that increase individuals’ opportunities to communicate with each other. In the 1990s, the internet took over with many people not understanding its utility. Flash forward 30 years, and we cannot live without our connection to the internet. The internet of information is what we called early adopters with individuals posting blogs for others to read, this was known as Web 1.0. As we progress, platforms became social allowing individuals in different areas to communicate and engage with each other, this was known as Web 2.0. As Dr. …


Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo Jul 2021

Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo

Theses and Dissertations

Copy number variation, as a major source of genetic variation in the human genome, are gains or losses of the DNA segments. Copy number variation has gained considerable interest as it plays important roles in human complex diseases. Therefore, accurate detection of CNVs with data generated by modern genotyping technologies, such as SNP array and whole-exome sequencing (WES), comprises a critical step toward a better understanding of disease etiology. However, current statistical methodologies for CNV detection still face analytical challenges due to numerous genetic and technological factors that may lead to spurious findings. First, existing methods assume the independent observations …


Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki Jun 2021

Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki

Dissertations, Theses, and Capstone Projects

In the United States, a significant population is facing an uphill battle trying to thrive in an industry that has seen exponential growth in recent years. Women, who account for approximately 50.8% of the U.S. population are statistically underpaid and underrepresented in science, technology, engineering, and mathematics (STEM). Despite women-led technology teams establishing a 21% greater return on investment than teams who don’t, and young women largely outperforming men in math according to a 2015 study, there are only three fortune 500 companies led by women, and they comprise only 10% of internet entrepreneurs. Research generates hundreds of articles, infographics, …


How Risk-Related Statistics, As Reported In News And Social Media, Are Linked To The Use Of The Public Transit System, Prashiddhi Pokhrel Apr 2021

How Risk-Related Statistics, As Reported In News And Social Media, Are Linked To The Use Of The Public Transit System, Prashiddhi Pokhrel

Thinking Matters Symposium

Due to the pandemic, people have started relying more on televisions, news, social media, and other news outlets for guidance. Moreover, with the increasing amount of news, data, and information there is also an increase in the amount of misleading statistics. People’s opinions and decisions significantly depend on the data, statistics, and information that they are exposed to, as well as their sources. For this project, we want to look at how information and its sources are affecting the decision made by the general public for the usage of the Portland Transit System. It is very important to know why …


Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith Sep 2020

Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith

Journal of Modern Applied Statistical Methods

The impact of sparse data conditions was examined among one or more predictor variables in logistic regression and assessed the effectiveness of the Firth (1993) procedure in reducing potential parameter estimation bias. Results indicated sparseness in binary predictors introduces bias that is substantial with small sample sizes, and the Firth procedure can effectively correct this bias.


Rdc Data Alternatives: Conducting Research During Covid-19, Kristi Thompson, Elizabeth Hill Apr 2020

Rdc Data Alternatives: Conducting Research During Covid-19, Kristi Thompson, Elizabeth Hill

Western Libraries Presentations

Recent physical distancing protocols pertaining to the COVID-19 Pandemic have meant that RDC researchers need to find alternatives ways of carrying out their research. The Real Time Remote Access (RTRA) program offers one alternative way to access confidential Statistics Canada data. Other options include using the Statistics Canada public use files and analyzing data from other sources.

The presenters, data librarians from Western Libraries will discuss the differences between the data that can be accessed through the RTRA the RDC. RTRA data is a very useful option for some types of questions but also has some important limitations. We will …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …


Art, Artfulness, Or Artifice?: A Review Of The Art Of Statistics: How To Learn From Data, By David Spiegelhalter, Jason Makansi Jan 2020

Art, Artfulness, Or Artifice?: A Review Of The Art Of Statistics: How To Learn From Data, By David Spiegelhalter, Jason Makansi

Numeracy

David Spiegelhalter. 2019. The Art of Statistics: How to Learn From Data. (London: The Penguin Group). 444 pp. ISBN 978-1541618510

The author successfully eases the reader away from the rigor of statistical methods and calculations and into the realm of statistical thinking. Despite an engaging style and attention-grabbing examples, the reader of The Art of Statistics will need more than a casual grounding in statistics to get what Spiegelhalter, I believe, intends from his book. It should be viewed as a companion to a more rigorous textbook on statistical methods but not necessarily a book that makes statistics any …


Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani Jan 2020

Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani

Statistics and Probability

No abstract provided.


Representing And Interpreting Data From Playfair, Diana White, River Bond, Joshua Eastes, Negar Janani Jan 2020

Representing And Interpreting Data From Playfair, Diana White, River Bond, Joshua Eastes, Negar Janani

Statistics and Probability

No abstract provided.


Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero Oct 2019

Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero

Theses and Dissertations

This thesis discusses time series analysis of weather data in South Carolina for the last fifteen years (January 2003 to December 2017) for Columbia, Greenville and North Myrtle Beach. The first part presents a brief overview of different variables that are used in the analysis. That is, temperature, dew point, humidity and sea level pressure. A short discussion of time series data is also introduced. The second part is about modeling the variables. The models of choice are presented, fitted and model diagnostics is carried out. In the third part, we discuss background on climates of the cities and model …


What Can We Do? Puzzling Over The Interpretation Of Heredity And Variation From Galton To Genetic Engineering, Peter J. Taylor May 2019

What Can We Do? Puzzling Over The Interpretation Of Heredity And Variation From Galton To Genetic Engineering, Peter J. Taylor

Working Papers on Science in a Changing World

First six chapters of a book motivated as follows: When I had mentioned to colleagues that I was exploring some significant issues overlooked by both sides in nature-nurture debates, the typical response was “we know, of course, that nature and nurture are intertwined”; they never asked “which nature-nurture science are you referring to?” It occurred to me that, in the long history of nature-nurture debates, opposing sides had always assumed or implied that these different scientific approaches were speaking to the same issues. If that were the case, then the challenge—something I was already puzzling over—was how best to draw …


Measuring Birth Trauma Rates In Maine Using Public Data, Mike Lapika Apr 2019

Measuring Birth Trauma Rates In Maine Using Public Data, Mike Lapika

Thinking Matters Symposium Archive

An increasing number of states are creating databases that collect and organize health insurance claims from public and private health care payers. Since December 2016, at least 18 states have these “all-payer claims databases” (APCDs), including Maine. APCDs are intended to inform cost containment and quality improvement by increasing transparency and informing consumer choice. For this project, we assessed how Maine’s APCD data might be used to produce standardized quality measures across facilities in the state. Specifically, we tested a birth outcome quality measure developed by the Agency for Healthcare Research and Quality (AHRQ), Birth Trauma – Injury to Neonate …


Microarray Data Analysis And Classification Of Cancers, Grant Gates Jan 2019

Microarray Data Analysis And Classification Of Cancers, Grant Gates

Williams Honors College, Honors Research Projects

When it comes to cancer, there is no standardized approach for identifying new cancer classes nor is there a standardized approach for assigning cancer tumors to existing classes. These two ideas are known as class discovery and class prediction. For a cancer patient to receive proper treatment, it is important that the type of cancer be accurately identified. For my Senior Honors Project, I would like to use this opportunity to research a topic in bioinformatics. Bioinformatics incorporates a few different subjects into one including biology, computer science and statistics. An intricate method for class discovery and class prediction is …


Seeing And Understanding Data, Beverly Wood, Charlotte Bolch Oct 2018

Seeing And Understanding Data, Beverly Wood, Charlotte Bolch

Statistics and Probability

No abstract provided.


Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen Aug 2018

Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Integrating real data into a classroom is one of the recommendations in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) college report which lays out guidelines for an introductory statistics course (Committee, GAISE College Report ASA Revision, 2016). In order to assess the effect of using real data in a classroom, the students received physical activity trackers to wear during an undergraduate introductory statistics course taught in the summer. This tracker, a Fitbit, enabled students to monitor and record their steps, calories, and active time throughout the class. Collecting personal activity data (PAD) creates a large database which …


Students’ Interpretations Of Categorical Data Using Dynamic Graphical Representations, Adam Eide Jan 2018

Students’ Interpretations Of Categorical Data Using Dynamic Graphical Representations, Adam Eide

Master's Theses and Doctoral Dissertations

Statistical association is an important concept in statistics. An exploratory study examined how students reason about statistical association utilizing graphical representations constructed with CODAP, a dynamic statistical graphing software. Task-based interviews were conducted with three 6th grade students prior to formal instruction. Students’ conceptions of a statistical relationship, proportional reasoning skill level, ability to interpret bivariate categorical graphs (particularly segmented bar graphs and two-way binned plots), and ability to identify association of two categorical variables were all investigated through interview tasks and responses to inquiry. Students were found to have developing proportional reasoning skills and struggled to correctly define and …


Of Rats And Men, Thomas S. Walsh Dec 2017

Of Rats And Men, Thomas S. Walsh

Capstones

This capstone is a data-driven investigation into New York City's rat problem. By using publicly available government data to map rat activity in NYC, I identified several socio-economic variables that correlate with rat populations at the community district, borough, and city-scale. I used these findings (mainly that rat problems are linked to lower incomes) as the basis of an investigation, which includes interviews with residents, experts, and city officials. Prof. Bobby Corrigan, urban rodentologist and formerly with the NYC Department of Health criticizes the city's efforts for the first time on the record.

https://thomasseiyawalsh.wixsite.com/ratstone


Statistics-Bierce Library Study, Tyler J. Hushour Jan 2017

Statistics-Bierce Library Study, Tyler J. Hushour

Williams Honors College, Honors Research Projects

This is a report from two surveys that I created and administered to students and faculty at Bierce library who came to the Circulation Desk or the Tech Desk, as well as some of my other findings when periodically looking around the library to see where students like to study or hang-out. There was a written survey given at the Circulation Desk, and a different survey given at the Tech Check-Out Desk. The project is for Melanie Smith-Farrell, the head of Access Services, and is based on a similar study Ian McCullough did in the science library. While this is …


Review Of Naked Statistics: Stripping The Dread From Data By Charles Wheelan, Michael T. Catalano Jan 2015

Review Of Naked Statistics: Stripping The Dread From Data By Charles Wheelan, Michael T. Catalano

Numeracy

Wheelan, Charles. Naked Statistics: Stripping the Dread from Data (New York, NY, W. W. Norton & Company, 2014). 282 pp. ISBN 978-0-393-07195-5

In his review of What Numbers Say and The Numbers Game, Rob Root (Numeracy 3(1): 9) writes “Popular books on quantitative literacy need to be easy to read, reasonably comprehensive in scope, and include examples that are thought-provoking and memorable.” Wheelan’s book certainly meets this description, and should be of interest to both the general public and those with a professional interest in numeracy. A moderately diligent learner can get a decent understanding of basic statistics …


The Art Of Personal Science, Jeff Fajans Feb 2014

The Art Of Personal Science, Jeff Fajans

The STEAM Journal

Quantified Self isn’t really about finding answers or solving problems—it’s about asking new questions.


On Covariance Structure In Noisy, Big Data, Randy Paffenroth, Ryan Nong, Philip Du Toit Sep 2013

On Covariance Structure In Noisy, Big Data, Randy Paffenroth, Ryan Nong, Philip Du Toit

Randy C. Paffenroth

Herein we describe theory and algorithms for detecting covariance structures in large, noisy data sets. Our work uses ideas from matrix completion and robust principal component analysis to detect the presence of low-rank covariance matrices, even when the data is noisy, distorted by large corruptions, and only partially observed. In fact, the ability to handle partial observations combined with ideas from randomized algorithms for matrix decomposition enables us to produce asymptotically fast algorithms. Herein we will provide numerical demonstrations of the methods and their convergence properties. While such methods have applicability to many problems, including mathematical finance, crime analysis, and …


A Reply To David Richards’ Review Of Measuring Human Rights, Todd Landman, Edzia Carvalho Jan 2012

A Reply To David Richards’ Review Of Measuring Human Rights, Todd Landman, Edzia Carvalho

Human Rights & Human Welfare

Professor Richards highlights, in his generous review of our book Measuring Human Rights that one of the aims of the book is to bring to the forefront the importance of conceptualization before operationalization – that conceptual clarity (or lack of it) is at the heart of the problems concerning the measurement of human rights. He draws out three key issues from the book as the springboard for further discussion on measurement of the concept – a) the “Respect, Protect and Fulfill” (RPF) framework, b) the lack of reliable data sources, and c) the conceptual links between human rights, human development, …


Theory Of Planned Behavior Model Fit Using Atod Prevention Program Data, Ying Jin Jul 2009

Theory Of Planned Behavior Model Fit Using Atod Prevention Program Data, Ying Jin

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This report is to test the Theory of Planned Behavior (TpB) model fit using the data collected from the ATOD Prevention Program conducted by the Operation Snowball Program from year 2004 to 2007 in Naperville, Illinois. Measurement Model and Structural Equation Modeling are used as principal modeling methods to test internal consistency of assigned measures for each construct and the dependency between constructs respectively. The results show that the ATOD Prevention Program data does not fit the TpB model perfectly. Extra paths should be added to the original theoretical model in order to obtain a satisfactory model fit.


A Comparison Of Methods For Longitudinal Analysis With Missing Data, James Algina, H. J. Keselman May 2004

A Comparison Of Methods For Longitudinal Analysis With Missing Data, James Algina, H. J. Keselman

Journal of Modern Applied Statistical Methods

In a longitudinal two-group randomized trials design, also referred to as randomized parallel-groups design or split-plot repeated measures design, the important hypothesis of interest is whether there are differential rates of change over time, that is, whether there is a group by time interaction. Several analytic methods have been presented in the literature for testing this important hypothesis when data are incomplete. We studied these methods for the case in which the missing data pattern is non-monotone. In agreement with earlier work on monotone missing data patterns, our results on bias, sampling variability, Type I error and power support the …