Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni Jul 2023

A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni

Theses and Dissertations

Scan statistics are useful methods for detecting spatial clustering. While they were initially developed to detect regions with an excess of binomial or Poisson events, spatial scan statistics have been extended to detect hotspots in other types of data including continuous data. They have many applications in different fields such as epidemiology (e.g. detecting disease outbreaks), sociology (e.g. detecting crime hotspots), and environmental health (e.g. detecting high-pollution areas). Spatial scan statistics identify a ‘most likely cluster’ and then use a likelihood ratio test to determine if this cluster is statistically significant. Spatial scan statistics have been extended to the Bayesian …


Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan May 2023

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan

Computer Science Senior Theses

We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …


Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change, Jonathan Sharpe May 2023

Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change, Jonathan Sharpe

Honors Projects

The game of baseball has changed a lot in the past twenty years. It can be primarily attributed to the explosion in data analytics and how they are used to evaluate baseball players. This led to different player profiles being preferred and eventually led to the development of players changing. As a result, the strategies employed have also evolved and turned into a different game than seen only a couple of decades ago. This paper will explore the changes that the game has seen. On the other hand, Major League Baseball has also implemented its own changes to try and …


How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar May 2022

How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar

Information Systems Undergraduate Honors Theses

Since the founding of computers, data scientists have been able to engineer devices that increase individuals’ opportunities to communicate with each other. In the 1990s, the internet took over with many people not understanding its utility. Flash forward 30 years, and we cannot live without our connection to the internet. The internet of information is what we called early adopters with individuals posting blogs for others to read, this was known as Web 1.0. As we progress, platforms became social allowing individuals in different areas to communicate and engage with each other, this was known as Web 2.0. As Dr. …


Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo Jul 2021

Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo

Theses and Dissertations

Copy number variation, as a major source of genetic variation in the human genome, are gains or losses of the DNA segments. Copy number variation has gained considerable interest as it plays important roles in human complex diseases. Therefore, accurate detection of CNVs with data generated by modern genotyping technologies, such as SNP array and whole-exome sequencing (WES), comprises a critical step toward a better understanding of disease etiology. However, current statistical methodologies for CNV detection still face analytical challenges due to numerous genetic and technological factors that may lead to spurious findings. First, existing methods assume the independent observations …


Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki Jun 2021

Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki

Dissertations, Theses, and Capstone Projects

In the United States, a significant population is facing an uphill battle trying to thrive in an industry that has seen exponential growth in recent years. Women, who account for approximately 50.8% of the U.S. population are statistically underpaid and underrepresented in science, technology, engineering, and mathematics (STEM). Despite women-led technology teams establishing a 21% greater return on investment than teams who don’t, and young women largely outperforming men in math according to a 2015 study, there are only three fortune 500 companies led by women, and they comprise only 10% of internet entrepreneurs. Research generates hundreds of articles, infographics, …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …


Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero Oct 2019

Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero

Theses and Dissertations

This thesis discusses time series analysis of weather data in South Carolina for the last fifteen years (January 2003 to December 2017) for Columbia, Greenville and North Myrtle Beach. The first part presents a brief overview of different variables that are used in the analysis. That is, temperature, dew point, humidity and sea level pressure. A short discussion of time series data is also introduced. The second part is about modeling the variables. The models of choice are presented, fitted and model diagnostics is carried out. In the third part, we discuss background on climates of the cities and model …


Measuring Birth Trauma Rates In Maine Using Public Data, Mike Lapika Apr 2019

Measuring Birth Trauma Rates In Maine Using Public Data, Mike Lapika

Thinking Matters Symposium Archive

An increasing number of states are creating databases that collect and organize health insurance claims from public and private health care payers. Since December 2016, at least 18 states have these “all-payer claims databases” (APCDs), including Maine. APCDs are intended to inform cost containment and quality improvement by increasing transparency and informing consumer choice. For this project, we assessed how Maine’s APCD data might be used to produce standardized quality measures across facilities in the state. Specifically, we tested a birth outcome quality measure developed by the Agency for Healthcare Research and Quality (AHRQ), Birth Trauma – Injury to Neonate …


Microarray Data Analysis And Classification Of Cancers, Grant Gates Jan 2019

Microarray Data Analysis And Classification Of Cancers, Grant Gates

Williams Honors College, Honors Research Projects

When it comes to cancer, there is no standardized approach for identifying new cancer classes nor is there a standardized approach for assigning cancer tumors to existing classes. These two ideas are known as class discovery and class prediction. For a cancer patient to receive proper treatment, it is important that the type of cancer be accurately identified. For my Senior Honors Project, I would like to use this opportunity to research a topic in bioinformatics. Bioinformatics incorporates a few different subjects into one including biology, computer science and statistics. An intricate method for class discovery and class prediction is …


Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen Aug 2018

Implementing The Use Of Personal Activity Data In An Introductory Statistics Course, Lacy Christensen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Integrating real data into a classroom is one of the recommendations in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) college report which lays out guidelines for an introductory statistics course (Committee, GAISE College Report ASA Revision, 2016). In order to assess the effect of using real data in a classroom, the students received physical activity trackers to wear during an undergraduate introductory statistics course taught in the summer. This tracker, a Fitbit, enabled students to monitor and record their steps, calories, and active time throughout the class. Collecting personal activity data (PAD) creates a large database which …


Students’ Interpretations Of Categorical Data Using Dynamic Graphical Representations, Adam Eide Jan 2018

Students’ Interpretations Of Categorical Data Using Dynamic Graphical Representations, Adam Eide

Master's Theses and Doctoral Dissertations

Statistical association is an important concept in statistics. An exploratory study examined how students reason about statistical association utilizing graphical representations constructed with CODAP, a dynamic statistical graphing software. Task-based interviews were conducted with three 6th grade students prior to formal instruction. Students’ conceptions of a statistical relationship, proportional reasoning skill level, ability to interpret bivariate categorical graphs (particularly segmented bar graphs and two-way binned plots), and ability to identify association of two categorical variables were all investigated through interview tasks and responses to inquiry. Students were found to have developing proportional reasoning skills and struggled to correctly define and …


Of Rats And Men, Thomas S. Walsh Dec 2017

Of Rats And Men, Thomas S. Walsh

Capstones

This capstone is a data-driven investigation into New York City's rat problem. By using publicly available government data to map rat activity in NYC, I identified several socio-economic variables that correlate with rat populations at the community district, borough, and city-scale. I used these findings (mainly that rat problems are linked to lower incomes) as the basis of an investigation, which includes interviews with residents, experts, and city officials. Prof. Bobby Corrigan, urban rodentologist and formerly with the NYC Department of Health criticizes the city's efforts for the first time on the record.

https://thomasseiyawalsh.wixsite.com/ratstone


Statistics-Bierce Library Study, Tyler J. Hushour Jan 2017

Statistics-Bierce Library Study, Tyler J. Hushour

Williams Honors College, Honors Research Projects

This is a report from two surveys that I created and administered to students and faculty at Bierce library who came to the Circulation Desk or the Tech Desk, as well as some of my other findings when periodically looking around the library to see where students like to study or hang-out. There was a written survey given at the Circulation Desk, and a different survey given at the Tech Check-Out Desk. The project is for Melanie Smith-Farrell, the head of Access Services, and is based on a similar study Ian McCullough did in the science library. While this is …


Theory Of Planned Behavior Model Fit Using Atod Prevention Program Data, Ying Jin Jul 2009

Theory Of Planned Behavior Model Fit Using Atod Prevention Program Data, Ying Jin

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This report is to test the Theory of Planned Behavior (TpB) model fit using the data collected from the ATOD Prevention Program conducted by the Operation Snowball Program from year 2004 to 2007 in Naperville, Illinois. Measurement Model and Structural Equation Modeling are used as principal modeling methods to test internal consistency of assigned measures for each construct and the dependency between constructs respectively. The results show that the ATOD Prevention Program data does not fit the TpB model perfectly. Extra paths should be added to the original theoretical model in order to obtain a satisfactory model fit.


The Robustness Of Factor Analyses When The Data Does Not Conform To Standard Parametric Requirements, Haisong Peng May 2004

The Robustness Of Factor Analyses When The Data Does Not Conform To Standard Parametric Requirements, Haisong Peng

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Objective: To access the robustness of factor analyses when the data does not conform to standard parametric requirements.

Methods: Data were simulated in package R. Maximum likelihood was used to fit and assess the factor models. Chi-square statistics were obtained to test hypotheses about the correct number of factors in simulated settings where the true number of factors was known. The number of true factors varied between 1 and 3; the number of observed variables was either 6 (for 1 factor) or 3 per factor for 2 or more factors.

Results: With standard normal factor populations, and normal errors added …


Linear Models For Multivariate Repeated Measures Data, Shantha S. Rao Apr 1996

Linear Models For Multivariate Repeated Measures Data, Shantha S. Rao

Mathematics & Statistics Theses & Dissertations

In this dissertation we focus mainly on the analysis of continuous multivariate repeated measurements data based on the assumption of multivariate normality. However certain aspects of the analysis of univariate repeated measures data are also considered. Typically, we have measurements on p variables (possibly correlated) in the form of px1 vectors yijk observed at k = 1,2, ...,tij occasions on j = 1,2, ..., ni individuals from i = 1,2, ..., g groups. We assume a naturally occurring covariance structure Vij ⊗ ∑ among the p variables on the jth individual from ith group …


A Test For Determining An Appropriate Model For Accelerated Life Data, Yuan-Who Chen May 1987

A Test For Determining An Appropriate Model For Accelerated Life Data, Yuan-Who Chen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The purpose of this thesis was to evaluate a method for testing the appropriateness of accelerated life model. This method is based upon a polynomial approximation. The parameters are estimated and used for testing the appropriateness of the model.

An example illustrates the polynomial method. Real data are applied for this method. Comparison with another method demonstrates that the polynomial method is much simpler and has comparable accuracy.


Comparison Of Transition Matrices Between Metropolitan And Non-Metropolitan Areas In The State Of Utah Using Juvenile Court Data, Sung-Ik Song May 1974

Comparison Of Transition Matrices Between Metropolitan And Non-Metropolitan Areas In The State Of Utah Using Juvenile Court Data, Sung-Ik Song

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The purpose of this paper is to use Markov Chains for the study of youths referred to the juvenile court in the metropolitan and non-metropolitan areas of the state of Utah.

Two computer programs were written for creating case histories for each person referred to the court and for testing for the significance of the difference among several transition matrices.

Another computer program, which was written by Soo Hong Uh, was used for analyzing realizations of a Markov chains up to the 4th order; a third computer program, originally written by David White, was used for interpreting Markov chains.

The …


Analysis Of Case Histories By Markov Chains Using Juvenile Court Data Of State Of Utah, Soo-Hong Uh May 1973

Analysis Of Case Histories By Markov Chains Using Juvenile Court Data Of State Of Utah, Soo-Hong Uh

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The purpose of this paper is to analyze juvenile court data using Markov Chains. A computer program was generalized with a single array orientation for analyzing realizations of a Markov Chain to the kth order within machine limitations. The data used in this paper were gathered by the Juvenile Court of the State of Utah for administrative purposes and limited to District II. The results from the paper, "Statistical Inference About Markov Chains" by Anderson and Goodman, were applied for testing hypotheses. The paper is divided into five chapters: introduction, statistical background, methodology, analysis and summary, conclusions.


Fitting Some Families Of Contagious Distributions To Biological And Accident Data, Yung-Sung Lee May 1971

Fitting Some Families Of Contagious Distributions To Biological And Accident Data, Yung-Sung Lee

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Four families of contagious distributions--generalized Poisson distributions, generalized binomial distributions, generalized Pascal distributions, and generalized log-zero distributions--are investigated in this thesis.

The family of generalized Poisson distributions contains five distributions: the Neyman Type A, the "Short," the Poisson binomial, the Poisson Pascal, and the negative binomial. The family of generalized binomial distributions contains eight distributions: the binomial Poisson, the binomial binomial, the binomial Pascal, the binomial log-zero, the Poisson with zeros, the binomial with zeros, the Pascal with zeros, and the log-zero with zeros. The family of generalized Pascal distributions contains four distributions: the Pascal Poisson, the Pascal binomial, the …