Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data

Theses/Dissertations

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 62

Full-Text Articles in Physical Sciences and Mathematics

A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni Jul 2023

A Bayesian Spatial Scan Statistic For Normal Data, Laasya Velamakanni

Theses and Dissertations

Scan statistics are useful methods for detecting spatial clustering. While they were initially developed to detect regions with an excess of binomial or Poisson events, spatial scan statistics have been extended to detect hotspots in other types of data including continuous data. They have many applications in different fields such as epidemiology (e.g. detecting disease outbreaks), sociology (e.g. detecting crime hotspots), and environmental health (e.g. detecting high-pollution areas). Spatial scan statistics identify a ‘most likely cluster’ and then use a likelihood ratio test to determine if this cluster is statistically significant. Spatial scan statistics have been extended to the Bayesian …


Phantom Shootings, Allan Ambris Jun 2023

Phantom Shootings, Allan Ambris

Dissertations, Theses, and Capstone Projects

This capstone is a website designed to critique NYC Open Data reporting with respect to shootings through a series of visualizations and discoveries. The NYPD Shooting Incidents datasets (Historic and Year to Date) introduce themselves to the user by claiming to be a “list of every shooting incident that occurred in NYC.” The supplied documentation reveals that this is not the case.

After understanding the supporting materials, there are still undisclosed truths. My exploration of the data revealed that a single victim may be represented across multiple entries. Additionally, multiple victims may be represented by a single entry. It is …


Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan May 2023

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan

Computer Science Senior Theses

We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …


Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change, Jonathan Sharpe May 2023

Baseball’S Evolution In The 21st Century, And How It Exemplifies Human Response To Change, Jonathan Sharpe

Honors Projects

The game of baseball has changed a lot in the past twenty years. It can be primarily attributed to the explosion in data analytics and how they are used to evaluate baseball players. This led to different player profiles being preferred and eventually led to the development of players changing. As a result, the strategies employed have also evolved and turned into a different game than seen only a couple of decades ago. This paper will explore the changes that the game has seen. On the other hand, Major League Baseball has also implemented its own changes to try and …


Social Impacts Of Robotics On The Labor And Employment Market, Kelvin Espinal Feb 2023

Social Impacts Of Robotics On The Labor And Employment Market, Kelvin Espinal

Dissertations, Theses, and Capstone Projects

Robotics have been introduced into the workplace to perform tasks that human beings have traditionally fulfilled. Complementing or substituting human labor with robotics eliminates human involvement in functions attributable to hazardous environments, heavy lifting, toxic substances, and repetitive low-level tasks. On the other hand, they are meant to be more efficient and cost-effective, saving money, time, and labor. However, since the introduction of robotics in the workforce, societal opposition has been towards this branch of technology in fear of losing employment, wages, and purpose.

Previous studies have reported an overarching societal fear that adopting robotics in the workplace and industry …


Big Data Analytics Of Medical Data, Ashwin Rajasankar Dec 2022

Big Data Analytics Of Medical Data, Ashwin Rajasankar

Culminating Experience Projects

Data has become a huge part of modern decision making. With the improvements in computing performance and storage in the past two decades, storing large amounts of data has become much easier. Analyzing large amounts of data and creating data models with them can help organizations obtain insights and information which helps their decision making. Big data analytics has become an integral part of many fields such as retail, real estate, education, and medicine. In the project, the goal is to understand the working of Apache Spark and its different storage methods and create a data warehouse to analyze data. …


Mapping The Covid-19 Pandemic In Staten Island, Vincenzo Mezzio May 2022

Mapping The Covid-19 Pandemic In Staten Island, Vincenzo Mezzio

Student Theses

COVID-19 has had diverging effects in New York City. Out of the five boroughs, Staten Island has one of the largest percentages of COVID-19 cases relative to population. This research examines key social and spatial factors that contribute to the increase in COVID-19 cases in Staten Island). It asks: Which parts of Staten Island have higher rates of transmission of COVID-19? Which parts of the borough have higher population who are more vulnerable to COVID-19? What is the relationship between the location of vaccination centers with the rates of COVID-19 cases? Using Geographic Information Systems (GIS), this research examines the …


How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar May 2022

How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar

Information Systems Undergraduate Honors Theses

Since the founding of computers, data scientists have been able to engineer devices that increase individuals’ opportunities to communicate with each other. In the 1990s, the internet took over with many people not understanding its utility. Flash forward 30 years, and we cannot live without our connection to the internet. The internet of information is what we called early adopters with individuals posting blogs for others to read, this was known as Web 1.0. As we progress, platforms became social allowing individuals in different areas to communicate and engage with each other, this was known as Web 2.0. As Dr. …


Meas: Exploring Links Between Implementation And Standards Mastery, Noah Silver Apr 2022

Meas: Exploring Links Between Implementation And Standards Mastery, Noah Silver

Honors Projects

In order to effectively enhance a student’s mathematical understanding and development in the field of mathematics, students need to engage in problem solving. Model eliciting activities, or MEAs, provide students with tasks that promote higher level thinking and the ability to utilize mathematics outside of the classroom; they also align and promote the utilization of the Common Core State Standards and Standards for Mathematical Practice. Research suggests that the language and motivation promoted by MEAs enriches engagement and increases student ability and performance of traditional and real-world mathematics. Use of technology further supports these goals. Through the analysis of checkpoint …


Outvoice: Bringing Transparency To Healthcare, Autumn Clark Feb 2022

Outvoice: Bringing Transparency To Healthcare, Autumn Clark

Undergraduate Honors Theses

Industries are not incentivized to price reasonably and spend responsibly if consumers do not have the ability to shop around within that industry, and shopping around is not possible without pricing transparency (knowing how much a good or service costs before purchasing it). But in the healthcare industry, we typically default to whichever clinic or hospital is closest, with no prior knowledge of what costs we can expect to incur at that particular institution. According to a poll published by Harvard University, nine out of ten Americans feel the healthcare industry is too opaque and greater transparency is needed.

We …


Blockchain: Key Principles, Nadezda Chikurova Feb 2022

Blockchain: Key Principles, Nadezda Chikurova

Dissertations, Theses, and Capstone Projects

“Blockchain: Key Principles” is an interactive visual project that explains the importance of data privacy and security, decentralized computing, and open-source software in the modern digital world through the history of the underlying principles of blockchain technology. Some of these key concepts have their roots in the time before the Information Age. By explaining the history of these principles, I want to present the fact that over the past centuries, humanity has been fighting for their privacy, security, and the ability to efficiently express themselves one way or another. Blockchain technology, which was introduced to the public in 2008 through …


Predicting Outcomes Of El Clásico Using Random Forests And Extreme Gradient Boosting, Emanuel Jarquin Jan 2022

Predicting Outcomes Of El Clásico Using Random Forests And Extreme Gradient Boosting, Emanuel Jarquin

CMC Senior Theses

In the modern era, sports betting is becoming increasingly popular. This is especially true in the realm of soccer (or ‘football’ as it is known outside the United States). As a result, the concept of attempting to predict the outcomes of soccer matches using machine learning has garnered much attention in recent years. In this thesis, I utilize well-known machine learning techniques to predict the outcomes of El Clásico matchups and compare the predictive performance of these techniques. The predictive methods employed for this thesis are random forests using the party package in R and extreme gradient boosting using the …


The Temple Of Immensity: For Choir And Electronics, Steven Naylor Dec 2021

The Temple Of Immensity: For Choir And Electronics, Steven Naylor

Honors Projects

the temple of immensity is a composition for 16-part choir and fixed media electronics composed by Steven Naylor using astronomical data concerning the stars nearest to Earth and their properties. “The temple of immensity” is an archaic and rarely used term, defined as “the universe or the complete overhead expanse of the heavens, especially as conceived as an object of religious reverence.” This piece seeks to convey feelings of wonder and awe for outer space through the setting of an original self-composed poem and through the use of star data to determine musical aspects. The resulting 28-minute composition blends voices …


Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo Jul 2021

Accurate And Integrative Detection Of Copy Number Variants With High-Throughput Data, Xizhi Luo

Theses and Dissertations

Copy number variation, as a major source of genetic variation in the human genome, are gains or losses of the DNA segments. Copy number variation has gained considerable interest as it plays important roles in human complex diseases. Therefore, accurate detection of CNVs with data generated by modern genotyping technologies, such as SNP array and whole-exome sequencing (WES), comprises a critical step toward a better understanding of disease etiology. However, current statistical methodologies for CNV detection still face analytical challenges due to numerous genetic and technological factors that may lead to spurious findings. First, existing methods assume the independent observations …


Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki Jun 2021

Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki

Dissertations, Theses, and Capstone Projects

In the United States, a significant population is facing an uphill battle trying to thrive in an industry that has seen exponential growth in recent years. Women, who account for approximately 50.8% of the U.S. population are statistically underpaid and underrepresented in science, technology, engineering, and mathematics (STEM). Despite women-led technology teams establishing a 21% greater return on investment than teams who don’t, and young women largely outperforming men in math according to a 2015 study, there are only three fortune 500 companies led by women, and they comprise only 10% of internet entrepreneurs. Research generates hundreds of articles, infographics, …


Security Against Data Falsification Attacks In Smart City Applications, Venkata Praveen Kumar Madhavarapu Jan 2021

Security Against Data Falsification Attacks In Smart City Applications, Venkata Praveen Kumar Madhavarapu

Doctoral Dissertations

Smart city applications like smart grid, smart transportation, healthcare deal with very important data collected from IoT devices. False reporting of data consumption from device failures or by organized adversaries may have drastic consequences on the quality of operations. To deal with this, we propose a coarse grained and a fine grained anomaly based security event detection technique that uses indicators such as deviation and directional change in the time series of the proposed anomaly detection metrics to detect different attacks. We also built a trust scoring metric to filter out the malicious devices. Another challenging problem is injection of …


Data And Assessment Management In Collegiate Recreation, Jeana Carow Dec 2020

Data And Assessment Management In Collegiate Recreation, Jeana Carow

Graduate Theses and Dissertations

Collegiate recreation programs and centers typically provide traditional programming space in addition to a range of physical activity spaces and resources, as a valuable part of the student experience. The external pressures of identifying and communicating departmental value and impact on the campus community has resulted in collegiate recreation departments’ use of data to communicate the effectiveness and impact of their work. The purpose of the study was to identify the data collection and assessment management practices of collegiate recreation departments, particularly focusing on the organization of data and assessment strategies as well as data collection, storage, reporting, analyzing, and …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …


Complex Systems Analysis In Selected Domains: Animal Biosecurity & Genetic Expression, Luke Trinity Jan 2020

Complex Systems Analysis In Selected Domains: Animal Biosecurity & Genetic Expression, Luke Trinity

Graduate College Dissertations and Theses

I first broadly define the study of complex systems, identifying language to describe and characterize mechanisms of such systems which is applicable across disciplines. An overview of methods is provided, including the description of a software development methodology which defines how a combination of computer science, statistics, and mathematics are applied to specified domains. This work describes strategies to facilitate timely completion of robust and adaptable projects which vary in complexity and scope. A biosecurity informatics pipeline is outlined, which is an abstraction useful in organizing the analysis of biological data from cells. This is followed by specific applications of …


A Machine Learning Approach To The Perception Of Phrase Boundaries In Music, Evan Matthew Petratos Jan 2020

A Machine Learning Approach To The Perception Of Phrase Boundaries In Music, Evan Matthew Petratos

Senior Projects Fall 2020

Segmentation is a well-studied area of research for speech, but the segmentation of music has typically been treated as a separate domain, even though the same acoustic cues that constitute information in speech (e.g., intensity, timbre, and rhythm) are present in music. This study aims to sew the gap in research of speech and music segmentation. Musicians can discern where musical phrases are segmented. In this study, these boundaries are predicted using an algorithmic, machine learning approach to audio processing of acoustic features. The acoustic features of musical sounds have localized patterns within sections of the music that create aurally …


Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero Oct 2019

Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero

Theses and Dissertations

This thesis discusses time series analysis of weather data in South Carolina for the last fifteen years (January 2003 to December 2017) for Columbia, Greenville and North Myrtle Beach. The first part presents a brief overview of different variables that are used in the analysis. That is, temperature, dew point, humidity and sea level pressure. A short discussion of time series data is also introduced. The second part is about modeling the variables. The models of choice are presented, fitted and model diagnostics is carried out. In the third part, we discuss background on climates of the cities and model …


Measuring Birth Trauma Rates In Maine Using Public Data, Mike Lapika Apr 2019

Measuring Birth Trauma Rates In Maine Using Public Data, Mike Lapika

Thinking Matters Symposium Archive

An increasing number of states are creating databases that collect and organize health insurance claims from public and private health care payers. Since December 2016, at least 18 states have these “all-payer claims databases” (APCDs), including Maine. APCDs are intended to inform cost containment and quality improvement by increasing transparency and informing consumer choice. For this project, we assessed how Maine’s APCD data might be used to produce standardized quality measures across facilities in the state. Specifically, we tested a birth outcome quality measure developed by the Agency for Healthcare Research and Quality (AHRQ), Birth Trauma – Injury to Neonate …


Representation And Reconstruction Of Linear, Time-Invariant Networks, Nathan Scott Woodbury Apr 2019

Representation And Reconstruction Of Linear, Time-Invariant Networks, Nathan Scott Woodbury

Theses and Dissertations

Network reconstruction is the process of recovering a unique structured representation of some dynamic system using input-output data and some additional knowledge about the structure of the system. Many network reconstruction algorithms have been proposed in recent years, most dealing with the reconstruction of strictly proper networks (i.e., networks that require delays in all dynamics between measured variables). However, no reconstruction technique presently exists capable of recovering both the structure and dynamics of networks where links are proper (delays in dynamics are not required) and not necessarily strictly proper.The ultimate objective of this dissertation is to develop algorithms capable of …


A Bottom-Up Modeling Methodology Using Knowledge Graphs For Composite Metric Development Applied To Traffic Crashes In The State Of Texas, Daniel Michael Mejia Jan 2019

A Bottom-Up Modeling Methodology Using Knowledge Graphs For Composite Metric Development Applied To Traffic Crashes In The State Of Texas, Daniel Michael Mejia

Open Access Theses & Dissertations

Data is a key factor for understanding real-world phenomena. Data can be discovered and integrated from multiple sources and has the potential to be interpreted in a multitude of ways. Traffic crashes, for example, are common events that occur in cities and provide a significant amount of data that has potential to be analyzed and disseminated in a way that can improve mobility of people, and ultimately improve the quality of life. Improving the quality of life of city residents through the use of data and technology is at the core of Smart Cities solutions. Measuring the improvement that Smart …


@Yourlocation: A Spatial Analysis Of Geotagged Tweets In The Us, Ocean Mckinney Jan 2019

@Yourlocation: A Spatial Analysis Of Geotagged Tweets In The Us, Ocean Mckinney

CMC Senior Theses

This project examines the spatial network properties observable from geo-located tweet data. Conventional exploration examines characteristics of a variety of network attributes, but few employ spatial edge correlations in their analysis. Recent studies have demonstrated the improvements that these correlations contribute to drawing conclusions about network structure. This thesis expands upon social network research utilizing spatial edge correlations and presents processing and formatting techniques for JSON (JavaScript Object Notation) data.


A Delphi Study Analysis Of Best Practices For Data Quality And Management In Healthcare Information Systems, Olivia L. Pollard Jan 2019

A Delphi Study Analysis Of Best Practices For Data Quality And Management In Healthcare Information Systems, Olivia L. Pollard

Walden Dissertations and Doctoral Studies

Healthcare in the US continues to suffer from the poor data quality practices processes that would ensure accuracy of patient health care records and information. A lack of current scholarly research on best practices in data quality and records management has failed to identify potential flaws within the relatively new electronic health records environment that affect not only patient safety but also cost, reimbursements, services, and most importantly, patient safety. The focus of this study was to current best practices using a panel of 25 health care industry data quality experts. The conceptual lens was developed from the International Monetary …


U.S. Census Explorer: A Gui And Visualization Tool For The U.S. Census Data Api, Timothy Snyder Jan 2019

U.S. Census Explorer: A Gui And Visualization Tool For The U.S. Census Data Api, Timothy Snyder

Williams Honors College, Honors Research Projects

U.S. Census Explorer is a software application that is designed to provide tools for intuitive exploration and analysis of United States census data for non-technical users. The application serves as an interface into the U.S. Census Bureau’s data API that enables a complete workflow from data acquisition to data visualization without the need for technical intervention from the user. The suite of tools provided include a graphical user interface for dynamically querying U.S. census data, geographic visualizations, and the ability to download your work to common spreadsheet and image formats for inclusion in external works.


Microarray Data Analysis And Classification Of Cancers, Grant Gates Jan 2019

Microarray Data Analysis And Classification Of Cancers, Grant Gates

Williams Honors College, Honors Research Projects

When it comes to cancer, there is no standardized approach for identifying new cancer classes nor is there a standardized approach for assigning cancer tumors to existing classes. These two ideas are known as class discovery and class prediction. For a cancer patient to receive proper treatment, it is important that the type of cancer be accurately identified. For my Senior Honors Project, I would like to use this opportunity to research a topic in bioinformatics. Bioinformatics incorporates a few different subjects into one including biology, computer science and statistics. An intricate method for class discovery and class prediction is …


Protecting Privacy Of Data In The Internet Of Things With Policy Enforcement Fog Module, Abduljaleel Al-Hasnawi Dec 2018

Protecting Privacy Of Data In The Internet Of Things With Policy Enforcement Fog Module, Abduljaleel Al-Hasnawi

Dissertations

The growth of IoT applications has resulted in generating massive volumes of data about people and their surroundings. Significant portions of these data are sensitive since they reflect peoples' behaviors, interests, lifestyles, etc. Protecting sensitive IoT data from privacy violations is a challenge since these data need to be handled by public networks, servers and clouds, most of which are untrusted parties for data owners. In this study, a solution called Policy Enforcement Fog Module (PEFM) is proposed for protecting sensitive IoT data. The primary task of the PEFM solution is mandatory enforcement of privacy polices for sensitive IoT data-whenever …


Fully Convolutional Neural Networks For Pixel Classification In Historical Document Images, Seth Andrew Stewart Oct 2018

Fully Convolutional Neural Networks For Pixel Classification In Historical Document Images, Seth Andrew Stewart

Theses and Dissertations

We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with …