Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Theses/Dissertations

Data

Articles 1 - 26 of 26

Full-Text Articles in Entire DC Network

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan May 2023

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan

Computer Science Senior Theses

We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …


Big Data Analytics Of Medical Data, Ashwin Rajasankar Dec 2022

Big Data Analytics Of Medical Data, Ashwin Rajasankar

Culminating Experience Projects

Data has become a huge part of modern decision making. With the improvements in computing performance and storage in the past two decades, storing large amounts of data has become much easier. Analyzing large amounts of data and creating data models with them can help organizations obtain insights and information which helps their decision making. Big data analytics has become an integral part of many fields such as retail, real estate, education, and medicine. In the project, the goal is to understand the working of Apache Spark and its different storage methods and create a data warehouse to analyze data. …


Mapping The Covid-19 Pandemic In Staten Island, Vincenzo Mezzio May 2022

Mapping The Covid-19 Pandemic In Staten Island, Vincenzo Mezzio

Student Theses

COVID-19 has had diverging effects in New York City. Out of the five boroughs, Staten Island has one of the largest percentages of COVID-19 cases relative to population. This research examines key social and spatial factors that contribute to the increase in COVID-19 cases in Staten Island). It asks: Which parts of Staten Island have higher rates of transmission of COVID-19? Which parts of the borough have higher population who are more vulnerable to COVID-19? What is the relationship between the location of vaccination centers with the rates of COVID-19 cases? Using Geographic Information Systems (GIS), this research examines the …


Outvoice: Bringing Transparency To Healthcare, Autumn Clark Feb 2022

Outvoice: Bringing Transparency To Healthcare, Autumn Clark

Undergraduate Honors Theses

Industries are not incentivized to price reasonably and spend responsibly if consumers do not have the ability to shop around within that industry, and shopping around is not possible without pricing transparency (knowing how much a good or service costs before purchasing it). But in the healthcare industry, we typically default to whichever clinic or hospital is closest, with no prior knowledge of what costs we can expect to incur at that particular institution. According to a poll published by Harvard University, nine out of ten Americans feel the healthcare industry is too opaque and greater transparency is needed.

We …


Security Against Data Falsification Attacks In Smart City Applications, Venkata Praveen Kumar Madhavarapu Jan 2021

Security Against Data Falsification Attacks In Smart City Applications, Venkata Praveen Kumar Madhavarapu

Doctoral Dissertations

Smart city applications like smart grid, smart transportation, healthcare deal with very important data collected from IoT devices. False reporting of data consumption from device failures or by organized adversaries may have drastic consequences on the quality of operations. To deal with this, we propose a coarse grained and a fine grained anomaly based security event detection technique that uses indicators such as deviation and directional change in the time series of the proposed anomaly detection metrics to detect different attacks. We also built a trust scoring metric to filter out the malicious devices. Another challenging problem is injection of …


A Machine Learning Approach To The Perception Of Phrase Boundaries In Music, Evan Matthew Petratos Jan 2020

A Machine Learning Approach To The Perception Of Phrase Boundaries In Music, Evan Matthew Petratos

Senior Projects Fall 2020

Segmentation is a well-studied area of research for speech, but the segmentation of music has typically been treated as a separate domain, even though the same acoustic cues that constitute information in speech (e.g., intensity, timbre, and rhythm) are present in music. This study aims to sew the gap in research of speech and music segmentation. Musicians can discern where musical phrases are segmented. In this study, these boundaries are predicted using an algorithmic, machine learning approach to audio processing of acoustic features. The acoustic features of musical sounds have localized patterns within sections of the music that create aurally …


Complex Systems Analysis In Selected Domains: Animal Biosecurity & Genetic Expression, Luke Trinity Jan 2020

Complex Systems Analysis In Selected Domains: Animal Biosecurity & Genetic Expression, Luke Trinity

Graduate College Dissertations and Theses

I first broadly define the study of complex systems, identifying language to describe and characterize mechanisms of such systems which is applicable across disciplines. An overview of methods is provided, including the description of a software development methodology which defines how a combination of computer science, statistics, and mathematics are applied to specified domains. This work describes strategies to facilitate timely completion of robust and adaptable projects which vary in complexity and scope. A biosecurity informatics pipeline is outlined, which is an abstraction useful in organizing the analysis of biological data from cells. This is followed by specific applications of …


Representation And Reconstruction Of Linear, Time-Invariant Networks, Nathan Scott Woodbury Apr 2019

Representation And Reconstruction Of Linear, Time-Invariant Networks, Nathan Scott Woodbury

Theses and Dissertations

Network reconstruction is the process of recovering a unique structured representation of some dynamic system using input-output data and some additional knowledge about the structure of the system. Many network reconstruction algorithms have been proposed in recent years, most dealing with the reconstruction of strictly proper networks (i.e., networks that require delays in all dynamics between measured variables). However, no reconstruction technique presently exists capable of recovering both the structure and dynamics of networks where links are proper (delays in dynamics are not required) and not necessarily strictly proper.The ultimate objective of this dissertation is to develop algorithms capable of …


A Bottom-Up Modeling Methodology Using Knowledge Graphs For Composite Metric Development Applied To Traffic Crashes In The State Of Texas, Daniel Michael Mejia Jan 2019

A Bottom-Up Modeling Methodology Using Knowledge Graphs For Composite Metric Development Applied To Traffic Crashes In The State Of Texas, Daniel Michael Mejia

Open Access Theses & Dissertations

Data is a key factor for understanding real-world phenomena. Data can be discovered and integrated from multiple sources and has the potential to be interpreted in a multitude of ways. Traffic crashes, for example, are common events that occur in cities and provide a significant amount of data that has potential to be analyzed and disseminated in a way that can improve mobility of people, and ultimately improve the quality of life. Improving the quality of life of city residents through the use of data and technology is at the core of Smart Cities solutions. Measuring the improvement that Smart …


A Delphi Study Analysis Of Best Practices For Data Quality And Management In Healthcare Information Systems, Olivia L. Pollard Jan 2019

A Delphi Study Analysis Of Best Practices For Data Quality And Management In Healthcare Information Systems, Olivia L. Pollard

Walden Dissertations and Doctoral Studies

Healthcare in the US continues to suffer from the poor data quality practices processes that would ensure accuracy of patient health care records and information. A lack of current scholarly research on best practices in data quality and records management has failed to identify potential flaws within the relatively new electronic health records environment that affect not only patient safety but also cost, reimbursements, services, and most importantly, patient safety. The focus of this study was to current best practices using a panel of 25 health care industry data quality experts. The conceptual lens was developed from the International Monetary …


@Yourlocation: A Spatial Analysis Of Geotagged Tweets In The Us, Ocean Mckinney Jan 2019

@Yourlocation: A Spatial Analysis Of Geotagged Tweets In The Us, Ocean Mckinney

CMC Senior Theses

This project examines the spatial network properties observable from geo-located tweet data. Conventional exploration examines characteristics of a variety of network attributes, but few employ spatial edge correlations in their analysis. Recent studies have demonstrated the improvements that these correlations contribute to drawing conclusions about network structure. This thesis expands upon social network research utilizing spatial edge correlations and presents processing and formatting techniques for JSON (JavaScript Object Notation) data.


Microarray Data Analysis And Classification Of Cancers, Grant Gates Jan 2019

Microarray Data Analysis And Classification Of Cancers, Grant Gates

Williams Honors College, Honors Research Projects

When it comes to cancer, there is no standardized approach for identifying new cancer classes nor is there a standardized approach for assigning cancer tumors to existing classes. These two ideas are known as class discovery and class prediction. For a cancer patient to receive proper treatment, it is important that the type of cancer be accurately identified. For my Senior Honors Project, I would like to use this opportunity to research a topic in bioinformatics. Bioinformatics incorporates a few different subjects into one including biology, computer science and statistics. An intricate method for class discovery and class prediction is …


U.S. Census Explorer: A Gui And Visualization Tool For The U.S. Census Data Api, Timothy Snyder Jan 2019

U.S. Census Explorer: A Gui And Visualization Tool For The U.S. Census Data Api, Timothy Snyder

Williams Honors College, Honors Research Projects

U.S. Census Explorer is a software application that is designed to provide tools for intuitive exploration and analysis of United States census data for non-technical users. The application serves as an interface into the U.S. Census Bureau’s data API that enables a complete workflow from data acquisition to data visualization without the need for technical intervention from the user. The suite of tools provided include a graphical user interface for dynamically querying U.S. census data, geographic visualizations, and the ability to download your work to common spreadsheet and image formats for inclusion in external works.


Protecting Privacy Of Data In The Internet Of Things With Policy Enforcement Fog Module, Abduljaleel Al-Hasnawi Dec 2018

Protecting Privacy Of Data In The Internet Of Things With Policy Enforcement Fog Module, Abduljaleel Al-Hasnawi

Dissertations

The growth of IoT applications has resulted in generating massive volumes of data about people and their surroundings. Significant portions of these data are sensitive since they reflect peoples' behaviors, interests, lifestyles, etc. Protecting sensitive IoT data from privacy violations is a challenge since these data need to be handled by public networks, servers and clouds, most of which are untrusted parties for data owners. In this study, a solution called Policy Enforcement Fog Module (PEFM) is proposed for protecting sensitive IoT data. The primary task of the PEFM solution is mandatory enforcement of privacy polices for sensitive IoT data-whenever …


Fully Convolutional Neural Networks For Pixel Classification In Historical Document Images, Seth Andrew Stewart Oct 2018

Fully Convolutional Neural Networks For Pixel Classification In Historical Document Images, Seth Andrew Stewart

Theses and Dissertations

We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with …


Breadcrumbs: Privacy As A Privilege, Prachi Bhardwaj Dec 2017

Breadcrumbs: Privacy As A Privilege, Prachi Bhardwaj

Capstones

Breadcrumbs: Privacy as a Privilege Abstract

By: Prachi Bhardwaj

In 2017, the world saw more data breaches than in any year prior. The count was more than the all-time high record in 2016, which was 40 percent more than the year before that.

That’s because consumer data is incredibly valuable today. In the last three decades, data storage has gone from being stored physically to being stored almost entirely digitally, which means consumer data is more accessible and applicable to business strategies. As a result, companies are gathering data in ways previously unknown to the average consumer, and hackers are …


Prosense, Johnny Favazza Ii, Casey Glasgow, Matt Epperson Jun 2016

Prosense, Johnny Favazza Ii, Casey Glasgow, Matt Epperson

Computer Engineering

This project aims to gather advanced data sets from MEMS sensors and GPS and deliver it to the user, who can capitalize on the data. The once negligible half-degree difference of your board barreling down a wave can be recorded from a gyro and exploited for the perfect turn. The exact speed dreaded by longboarders where speed wobbles turn into a road rash can be analysed and consequently avoided. Ascertaining the summit of your flight using combined GPS sensors from the ski ramp allows for the correct timing of tricks. When it comes to pursuing excellence in professional sports, amateur …


Ultrasonic Data Steganography, Alexander Orosz Edwards Mar 2016

Ultrasonic Data Steganography, Alexander Orosz Edwards

KSU Journey Honors College Capstones and Theses

What started off as a question on the possibly of data transmission via sound above the level of human hearing evolved into a project exploring the possibility of ultrasonic data infiltration and exfiltration in an information security context. It is well known that sound can be used to transmit data as this can be seen in many old technologies, most notably and simply DTMF tones for phone networks. But what if the sound used to transmit signals was in in the ultrasonic range? It would go generally unnoticed to anyone not looking for it with tools such as a spectrum …


Classifying System Call Traces Using Anomalous Detection, William Doyle Jun 2015

Classifying System Call Traces Using Anomalous Detection, William Doyle

Honors Theses

We used data mining techniques to detect intrusions among system call traces and have outlined our results. Recent work at the intersection of security and machine learning has lead to better understanding of anomalous intrusion detection. There is a need to more thoroughly understand how anomaly detection can be used because of its potential applications and advantages over current standard methods. In this thesis, we report on a new approach of anomalous detection using system call traces. Our goal is to be able to create a system that can accurately detect hacking attacks by analyzing the sequences of system calls …


Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer Oct 2014

Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer

Theses and Dissertations

Lists of records in machine-printed documents contain much useful information. As one example, the thousands of family history books scanned, OCRed, and placed on-line by FamilySearch.org probably contain hundreds of millions of fact assertions about people, places, family relationships, and life events. Data like this cannot be fully utilized until a person or process locates the data in the document text, extracts it, and structures it with respect to an ontology or database schema. Yet, in the family history industry and other industries, data in lists goes largely unused because no known approach adequately addresses all of the costs, challenges, …


A Medical Data Cleaner, Jahnavi Yetukuri May 2013

A Medical Data Cleaner, Jahnavi Yetukuri

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This report describes medical-data cleaning tool, called MedDataCleaner that can detect outliers in medical data and assistant Database Administrators in resolving data-related problem. Specifically, MedDataCleaner, enables the users to define cleaning rules and offers the ability to choose classification methods that help determine if the data is good or bad. MedDataClearer uses Vitruvian DB objects for object-relation mapping (ORM) support and Vitruvian alignment links for designing the GUI.

My contribution towards this work includes designing the user interfaces using Vitruvian Alignment links, design and implement mean, standard deviation and neural classification methods using Vitruvian DB objects.


A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson Aug 2012

A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson

Theses and Dissertations

Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they …


Rapid Decoding Of Digital Data Streams Using Field Programmable Gate Arrays, Andrew Hernandez Jan 2012

Rapid Decoding Of Digital Data Streams Using Field Programmable Gate Arrays, Andrew Hernandez

Dissertations and Theses

No abstract provided.


Pedagogical Tool For Usability Science Final Project Report, Daniel D. Mendelsohn Jun 2011

Pedagogical Tool For Usability Science Final Project Report, Daniel D. Mendelsohn

Honors Theses

A Sophomore Research Seminar (SRS) at Union College teaches about usability science, the study of designing interfaces that allow the user to accomplish a given task with less time and frustration. In this context, an interface can be anything that allows interaction with a physical or virtual device such as a web browser or the knobs on a stove. In this SRS, students design interface mockups, called prototypes, out of inexpensive material such as cardboard. Students use these prototypes to test their interfaces on real people, who are asked to perform a task that would be performed on a real …


Optimal Candidate Generation In Spatial Co-Location Mining, Zhongshan Lin May 2009

Optimal Candidate Generation In Spatial Co-Location Mining, Zhongshan Lin

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Existing spatial co-location algorithms based on levels suffer from generating extra, nonclique candidate instances. Thus, they require cliqueness checking at every level. In this thesis, a novel, spatial co-location mining algorithm that automatically generates co-located spatial features without generating any nonclique candidates at any level is proposed. Subsequently, this algorithm generates fewer candidates than other existing level-wise, co-location algorithms without losing any pertinent information. The benefits of this algorithm have been clearly observed at early stages in the mining process.


Techniques To Explore Time-Related Correlation In Large Datasets, Sumeet Dua Jan 2002

Techniques To Explore Time-Related Correlation In Large Datasets, Sumeet Dua

LSU Doctoral Dissertations

The next generation of database management and computing systems will be significantly complex with data distributed both in functionality and operation. The complexity arises, at least in part, due to data types involved and types of information request rendered by the database user. Time sequence databases are generated in many practical applications. Detecting similar sequences and subsequences within these databases is an important research area and has generated lot of interest recently. Previous studies in this area have concentrated on calculating similitude between (sub)sequences of equal sizes. The question of unequal sized (sub)sequence comparison to report similitude has been an …