Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

2021

Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 41

Full-Text Articles in Databases and Information Systems

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee Dec 2021

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee

Dissertations

A wide spectrum of big data applications in science, engineering, and industry generate large datasets, which must be managed and processed in a timely and reliable manner for knowledge discovery. These tasks are now commonly executed in big data computing systems exemplified by Hadoop based on parallel processing and distributed storage and management. For example, many companies and research institutions have developed and deployed big data systems on top of NoSQL databases such as HBase and MongoDB, and parallel computing frameworks such as MapReduce and Spark, to ensure timely data analyses and efficient result delivery for decision making and business …


Messiness: Automating Iot Data Streaming Spatial Analysis, Christopher White, Atilio Barreda Ii Dec 2021

Messiness: Automating Iot Data Streaming Spatial Analysis, Christopher White, Atilio Barreda Ii

Publications and Research

The spaces we live in go through many transformations over the course of a year, a month, or a day; My room has seen tremendous clutter and pristine order within the span of a few hours. My goal is to discover patterns within my space and formulate an understanding of the changes that occur. This insight will provide actionable direction for maintaining a cleaner environment, as well as provide some information about the optimal times for productivity and energy preservation.

Using a Raspberry Pi, I will set up automated image capture in a room in my home. These images will …


Robust Bipoly-Matching For Multi-Granular Entities, Ween Jiann Lee, Maksim Tkachenko, Hady W. Lauw Dec 2021

Robust Bipoly-Matching For Multi-Granular Entities, Ween Jiann Lee, Maksim Tkachenko, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `one' necessarily comes from a designated source and the `many' from the other source. In contrast, we propose a novel formulation that allows concurrent one-to-many bidirectional matching in any direction. Beyond flexibility, we also seek matching that is more robust to noisy similarity values arising from diverse entity descriptions, by introducing receptivity and reclusivity notions. In addition …


Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim Dec 2021

Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim

Electronic Theses, Projects, and Dissertations

Automobile collisions occur daily. We now live in an information-driven world, one where technology is quickly evolving. Blockchain technology can change the automotive industry, the safety of the motoring public and its surrounding environment by incorporating this vast array of information. It can place safety and efficiency at the forefront to pedestrians, public establishments, and provide public agencies with pertinent information securely and efficiently. Other industries where Blockchain technology has been effective in are as follows: supply chain management, logistics, and banking. This paper reviews some statistical information regarding automobile collisions, Blockchain technology, Smart Contracts, Smart Cities; assesses the feasibility …


Integration Of Internet Of Things And Health Recommender Systems, Moonkyung Yang Dec 2021

Integration Of Internet Of Things And Health Recommender Systems, Moonkyung Yang

Electronic Theses, Projects, and Dissertations

The Internet of Things (IoT) has become a part of our lives and has provided many enhancements to day-to-day living. In this project, IoT in healthcare is reviewed. IoT-based healthcare is utilized in remote health monitoring, observing chronic diseases, individual fitness programs, helping the elderly, and many other healthcare fields. There are three main architectures of smart IoT healthcare: Three-Layer Architecture, Service-Oriented Based Architecture (SoA), and The Middleware-Based IoT Architecture. Depending on the required services, different IoT architecture are being used. In addition, IoT healthcare services, IoT healthcare service enablers, IoT healthcare applications, and IoT healthcare services focusing on Smartwatch …


Pre-Earthquake Ionospheric Perturbation Identification Using Cses Data Via Transfer Learning, Pan Xiong, Cheng Long, Huiyu Zhou, Roberto Battiston, Angelo De Santis, Dimitar Ouzounov, Xuemin Zhang, Xuhui Shen Nov 2021

Pre-Earthquake Ionospheric Perturbation Identification Using Cses Data Via Transfer Learning, Pan Xiong, Cheng Long, Huiyu Zhou, Roberto Battiston, Angelo De Santis, Dimitar Ouzounov, Xuemin Zhang, Xuhui Shen

Mathematics, Physics, and Computer Science Faculty Articles and Research

During the lithospheric buildup to an earthquake, complex physical changes occur within the earthquake hypocenter. Data pertaining to the changes in the ionosphere may be obtained by satellites, and the analysis of data anomalies can help identify earthquake precursors. In this paper, we present a deep-learning model, SeqNetQuake, that uses data from the first China Seismo-Electromagnetic Satellite (CSES) to identify ionospheric perturbations prior to earthquakes. SeqNetQuake achieves the best performance [F-measure (F1) = 0.6792 and Matthews correlation coefficient (MCC) = 0.427] when directly trained on the CSES dataset with a spatial window centered on the earthquake epicenter with the Dobrovolsky …


Transfer-Learned Pruned Deep Convolutional Neural Networks For Efficient Plant Classification In Resource-Constrained Environments, Martinson Ofori Nov 2021

Transfer-Learned Pruned Deep Convolutional Neural Networks For Efficient Plant Classification In Resource-Constrained Environments, Martinson Ofori

Masters Theses & Doctoral Dissertations

Traditional means of on-farm weed control mostly rely on manual labor. This process is time-consuming, costly, and contributes to major yield losses. Further, the conventional application of chemical weed control can be economically and environmentally inefficient. Site-specific weed management (SSWM) counteracts this by reducing the amount of chemical application with localized spraying of weed species. To solve this using computer vision, precision agriculture researchers have used remote sensing weed maps, but this has been largely ineffective for early season weed control due to problems such as solar reflectance and cloud cover in satellite imagery. With the current advances in artificial …


Representation Learning On Multi-Layered Heterogeneous Network, Delvin Ce Zhang, Hady W. Lauw Nov 2021

Representation Learning On Multi-Layered Heterogeneous Network, Delvin Ce Zhang, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Network data can often be represented in a multi-layered structure with rich semantics. One example is e-commerce data, containing user-user social network layer and item-item context layer, with cross-layer user-item interactions. Given the dual characters of homogeneity within each layer and heterogeneity across layers, we seek to learn node representations from such a multi-layered heterogeneous network while jointly preserving structural information and network semantics. In contrast, previous works on network embedding mainly focus on single-layered or homogeneous networks with one type of nodes and links. In this paper we propose intra- and cross-layer proximity concepts. Intra-layer proximity simulates propagation along …


Topic Modeling For Multi-Aspect Listwise Comparison, Delvin Ce Zhang, Hady W. Lauw Nov 2021

Topic Modeling For Multi-Aspect Listwise Comparison, Delvin Ce Zhang, Hady W. Lauw

Research Collection School Of Computing and Information Systems

As a well-established probabilistic method, topic models seek to uncover latent semantics from plain text. In addition to having textual content, we observe that documents are usually compared in listwise rankings based on their content. For instance, world-wide countries are compared in an international ranking in terms of electricity production based on their national reports. Such document comparisons constitute additional information that reveal documents' relative similarities. Incorporating them into topic modeling could yield comparative topics that help to differentiate and rank documents. Furthermore, based on different comparison criteria, the observed document comparisons usually cover multiple aspects, each expressing a distinct …


Crest Or Trough? How Research Libraries Used Emerging Technologies To Survive The Pandemic, So Far, Scout Calvert Oct 2021

Crest Or Trough? How Research Libraries Used Emerging Technologies To Survive The Pandemic, So Far, Scout Calvert

UNL Libraries: Faculty Publications

Introduction

In the first months of the COVID-19 pandemic, it was impossible to tell if we were at the crest of a wave of new transmissions, or a trough of a much larger wave, still yet to peak. As of this writing, as colleges and universities prepare for mostly in-person fall 2021 semesters, case counts in the United States are increasing again after a decline that coincided with easier access to the COVID vaccine. Plans for a return to campus made with confidence this spring may be in doubt, as we climb the curve of what is already the second …


Towards Source-Aligned Variational Models For Cross-Domain Recommendation, Aghiles Salah, Thanh-Binh Tran, Hady W. Lauw Oct 2021

Towards Source-Aligned Variational Models For Cross-Domain Recommendation, Aghiles Salah, Thanh-Binh Tran, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Data sparsity is a long-standing challenge in recommender systems. Among existing approaches to alleviate this problem, cross-domain recommendation consists in leveraging knowledge from a source domain or category (e.g., Movies) to improve item recommendation in a target domain (e.g., Books). In this work, we advocate a probabilistic approach to cross-domain recommendation and rely on variational autoencoders (VAEs) as our latent variable models. More precisely, we assume that we have access to a VAE trained on the source domain that we seek to leverage to improve preference modeling in the target domain. To this end, we propose a model which learns …


Client Access Feature Engineering For The Homeless Community Of The City Of Portland, Oswaldo Ceballos Jr Aug 2021

Client Access Feature Engineering For The Homeless Community Of The City Of Portland, Oswaldo Ceballos Jr

altREU Projects

Given the severity of homeless in many cities across the country, the project at hand attempts to assist a service provider organization called Central City Concern (CCC) with their mission of providing services to the community of Portland. These services include housing, recovery, health care, and jobs. With many different types of services available through the works of CCC, there exists an abundance of information and data pertaining to the individuals that interact with the CCC service system. The goal of this project is to perform an exploratory analysis and feature engineer the existing datasets CCC has collected over the …


Exploratory Search With Archetype-Based Language Models, Brent D. Davis Aug 2021

Exploratory Search With Archetype-Based Language Models, Brent D. Davis

Electronic Thesis and Dissertation Repository

This dissertation explores how machine learning, natural language processing and information retrieval may assist the exploratory search task. Exploratory search is a search where the ideal outcome of the search is unknown, and thus the ideal language to use in a retrieval query to match it is unavailable. Three algorithms represent the contribution of this work. Archetype-based Modeling and Search provides a way to use previously identified archetypal documents relevant to an archetype to form a notion of similarity and find related documents that match the defined archetype. This is beneficial for exploratory search as it can generalize beyond standard …


Multilateration Index., Chip Lynch Aug 2021

Multilateration Index., Chip Lynch

Electronic Theses and Dissertations

We present an alternative method for pre-processing and storing point data, particularly for Geospatial points, by storing multilateration distances to fixed points rather than coordinates such as Latitude and Longitude. We explore the use of this data to improve query performance for some distance related queries such as nearest neighbor and query-within-radius (i.e. “find all points in a set P within distance d of query point q”). Further, we discuss the problem of “Network Adequacy” common to medical and communications businesses, to analyze questions such as “are at least 90% of patients living within 50 miles of a covered emergency …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


Exploring Cross-Modality Utilization In Recommender Systems, Quoc Tuan Truong, Aghiles Salah, Thanh-Binh Tran, Jingyao Guo, Hady W. Lauw Jul 2021

Exploring Cross-Modality Utilization In Recommender Systems, Quoc Tuan Truong, Aghiles Salah, Thanh-Binh Tran, Jingyao Guo, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Multimodal recommender systems alleviate the sparsity of historical user-item interactions. They are commonly catalogued based on the type of auxiliary data (modality) they leverage, such as preference data plus user-network (social), user/item texts (textual), or item images (visual) respectively. One consequence of this categorization is the tendency for virtual walls to arise between modalities. For instance, a study involving images would compare to only baselines ostensibly designed for images. However, a closer look at existing models' statistical assumptions about any one modality would reveal that many could work just as well with other modalities. Therefore, we pursue a systematic investigation …


Variational Learning From Implicit Bandit Feedback, Quoc Tuan Truong, Hady W. Lauw Jul 2021

Variational Learning From Implicit Bandit Feedback, Quoc Tuan Truong, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender systems involving both bandit and organic feedbacks. We develop a probabilistic framework with a likelihood function for estimating not only explicit positive observations but also implicit negative observations inferred from the data. Moreover, we introduce a latent variable model for organic-bandit feedbacks to robustly capture user preference distributions. Next, we analyze the behavior of the new likelihood under two …


Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris Jun 2021

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris

Dartmouth College Undergraduate Theses

In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.

First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus …


Reimagining The Archive For Computational Analysis At Scale, Jamie Rogers Jun 2021

Reimagining The Archive For Computational Analysis At Scale, Jamie Rogers

Works of the FIU Libraries

This presentation was part of a three-segment panel discussion sponsored by IS&T, the Society for Imaging Science and Technology, titled "OCR and Text Recognition: Workflows, Trends, and New Applications." This segment covers ways in which we have re-conceptualized archive materials as computationally useful data as well as the value of utilizing data at scale to impact research possibilities. We have been able to accomplish this through an ongoing project "dLOC as Data: A Thematic Approach to Caribbean Newspapers," a collaborative initiative between the Digital Library of the Caribbean, University of Florida, and Florida International University.


A Configurable Social Network For Running Irb-Approved Experiments, Mihovil Mandic Jun 2021

A Configurable Social Network For Running Irb-Approved Experiments, Mihovil Mandic

Dartmouth College Undergraduate Theses

Our world has never been more connected, and the size of the social media landscape draws a great deal of attention from academia. However, social networks are also a growing challenge for the Institutional Review Boards concerned with the subjects’ privacy. These networks contain a monumental variety of personal information of almost 4 billion people, allow for precise social profiling, and serve as a primary news source for many users. They are perfect environments for influence operations that are becoming difficult to defend against. Motivated to study online social influence via IRB-approved experiments, we designed and implemented a flexible, scalable, …


Soarnet, Deep Learning Thermal Detection For Free Flight, Jake T. Tallman Jun 2021

Soarnet, Deep Learning Thermal Detection For Free Flight, Jake T. Tallman

Master's Theses

Thermals are regions of rising hot air formed on the ground through the warming of the surface by the sun. Thermals are commonly used by birds and glider pilots to extend flight duration, increase cross-country distance, and conserve energy. This kind of powerless flight using natural sources of lift is called soaring. Once a thermal is encountered, the pilot flies in circles to keep within the thermal, so gaining altitude before flying off to the next thermal and towards the destination. A single thermal can net a pilot thousands of feet of elevation gain, however estimating thermal locations is not …


Federated Learning In Gaze Recognition (Fligr), Arun Gopal Govindaswamy May 2021

Federated Learning In Gaze Recognition (Fligr), Arun Gopal Govindaswamy

College of Computing and Digital Media Dissertations

The efficiency and generalizability of a deep learning model is based on the amount and diversity of training data. Although huge amounts of data are being collected, these data are not stored in centralized servers for further data processing. It is often infeasible to collect and share data in centralized servers due to various medical data regulations. This need for diversely distributed data and infeasible storage solutions calls for Federated Learning (FL). FL is a clever way of utilizing privately stored data in model building without the need for data sharing. The idea is to train several different models locally …


Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos May 2021

Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos

Electronic Theses and Dissertations

Recently, strategies of National Basketball Association teams have evolved with the skillsets of players and the emergence of advanced analytics. One of the most effective actions in dynamic offensive strategies in basketball is the dribble hand-off (DHO). This thesis proposes an architecture for a classification pipeline for detecting DHOs in an accurate and automated manner. This pipeline consists of a combination of player tracking data and event labels, a rule set to identify candidate actions, manually reviewing game recordings to label the candidates, and embedding player trajectories into hexbin cell paths before passing the completed training set to the classification …


Scope: Building And Testing An Integrated Manual-Automated Event Extraction Tool For Online Text-Based Media Sources, Matthew Crittenden May 2021

Scope: Building And Testing An Integrated Manual-Automated Event Extraction Tool For Online Text-Based Media Sources, Matthew Crittenden

Undergraduate Honors Theses

Building on insights from two years of manually extracting events information from online news media, an interactive information extraction environment (IIEE) was developed. SCOPE, the Scientific Collection of Open-source Policy Evidence, is a Python Django-based tool divided across specialized modules for extracting structured events data from unstructured text. These modules are grouped into a flexible framework which enables the user to tailor the tool to meet their needs. Following principles of user-oriented learning for information extraction (IE), SCOPE offers an alternative approach to developing AI-assisted IE systems. In this piece, we detail the ongoing development of the SCOPE tool, present …


Exploring Ai And Multiplayer In Java, Ronni Kurtzhals Apr 2021

Exploring Ai And Multiplayer In Java, Ronni Kurtzhals

Student Academic Conference

I conducted research into three topics: artificial intelligence, package deployment, and multiplayer servers in Java. This research came together to form my project presentation on the implementation of these topics, which I felt accurately demonstrated the various things I have learned from my courses at Moorhead State University. Several resources were consulted throughout the project, including the work of W3Schools and StackOverflow as well as relevant assignments and textbooks from previous classes. I found this project relevant to computer science and information systems for several reasons, such as the AI component and use of SQL data tables; but it was …


The Role Of Privacy Within The Realm Of Healthcare Wearables' Acceptance And Use, Thomas Jernejcic Apr 2021

The Role Of Privacy Within The Realm Of Healthcare Wearables' Acceptance And Use, Thomas Jernejcic

Masters Theses & Doctoral Dissertations

The flexibility and vitality of the Internet along with technological innovation have fueled an industry focused on the design of portable devices capable of supporting personal activities and wellbeing. These compute devices, known as wearables, are unique from other computers in that they are portable, specific in function, and worn or carried by the user. While there are definite benefits attributable to wearables, there are also notable risks, especially in the realm of security where personal information and/or activities are often accessible to third parties. In addition, protecting one’s private information is regularly an afterthought and thus lacking in maturity. …


Predicting The Outcome Of Nba Games, Matthew Houde Apr 2021

Predicting The Outcome Of Nba Games, Matthew Houde

Honors Projects in Data Science

The aim of the project is to create a machine learning model to predict NBA games. The purpose is to build upon and improve existing models. Research into other predictive sports models and machine learning techniques was conducted to understand what is currently being done to predict NBA games and how effective it is in doing so. After a thorough literary review, the model was created using Python and a variety of machine learning techniques. The dataset used had an array of team statistics for both the home and away team for each corresponding matchup and two supporting features were …


Sentiment-Oriented Metric Learning For Text-To-Image Retrieval, Quoc Tuan Truong, Hady W. Lauw Apr 2021

Sentiment-Oriented Metric Learning For Text-To-Image Retrieval, Quoc Tuan Truong, Hady W. Lauw

Research Collection School Of Computing and Information Systems

In this era of multimedia Web, text-to-image retrieval is a critical function of search engines and visually-oriented online platforms. Traditionally, the task primarily deals with matching a text query with the most relevant images available in the corpus. To an increasing extent, the Web also features visual expressions of preferences, imbuing images with sentiments that express those preferences. Cases in point include photos in online reviews as well as social media. In this work, we study the effects of sentiment information on text-to-image retrieval. Particularly, we present two approaches for incorporating sentiment orientation into metric learning for cross-modal retrieval. Each …


Collections As Data At Florida International University, Jamie Rogers Apr 2021

Collections As Data At Florida International University, Jamie Rogers

Works of the FIU Libraries

This presentation provides an overview of the concept of collections as data; shares information about our "dLOC as Data" grant initiative, a collaboration between the Digital Library of the Caribbean (dLOC), the Florida International University (FIU) Libraries Digital Collections Center, and the University of Florida Libraries, funded by the Mellon sub-award program, "Collections as Data: Part to Whole" ; as well as provides an opportunity to talk about how we can share more collections as data resources and undertake new and exciting projects at FIU.

Although the concept of collections as data isn't new, it is becoming more mainstream. As …


Mass Incarceration In Nebraska: Data And Historical Analysis Of Inmates From 1980-2020, Anna Krause Mar 2021

Mass Incarceration In Nebraska: Data And Historical Analysis Of Inmates From 1980-2020, Anna Krause

Honors Theses

This study examines Nebraska Department of Corrections inmate data from 1980-2020, looking specifically at inmate demographics and offense trends. State-of-the-art data analysis is conducted to collect, modify, and visualize the data sources. Inmates are organized by each decade they were incarcerated within. The current active prison population is also examined in their own research group. The demographic and offense trends are compared with previous local and national research. Historical context is given for evolving trends in offenses. Solutions for Nebraska prison overcrowding are presented from various interest groups. This study aims to enlighten all interested Nebraskans on who inhabits their …