Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Massachusetts Amherst

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 40

Full-Text Articles in Databases and Information Systems

A Very Small Pond: Discovery Systems That Can Be Used With Folio In Academic Libraries, Jaime Taylor, Aaron Neslin Jan 2023

A Very Small Pond: Discovery Systems That Can Be Used With Folio In Academic Libraries, Jaime Taylor, Aaron Neslin

University Libraries Presentations Series

FOLIO, an open source library services platform, does not have a front end patron interface for searching and using library materials. Any library installing FOLIO will need at least one other software to perform those functions. This article evaluates which systems, in a limited marketplace, are available for academic libraries to use with FOLIO.


Nens, Jaime Taylor Jan 2023

Nens, Jaime Taylor

University Libraries Presentations Series

NENS (non-student non-employee) are a group of designations for people who are somehow connected to UMass, but who are neither students nor employees. A person’s NENS designation determines what they have access to at UMass, including at the Libraries.


Neural Approaches For Language-Agnostic Search And Recommendation, Hamed Rezanejad Asl Bonab Oct 2022

Neural Approaches For Language-Agnostic Search And Recommendation, Hamed Rezanejad Asl Bonab

Doctoral Dissertations

There are significant efforts toward developing better neural approaches for information retrieval problems. However, the vast majority of these studies are conducted using English-only data. In fact, trends and statistics of non-English content and users on the Internet show exponential growth and that novel information retrieval systems need to be language-agnostic; they need to bridge the language barrier between users and content, leverage data from high-resource settings for lower-resourced settings, and be able to extend to new languages and local markets easily. To this end, we focus on search and recommendation as two vital components of information systems. We explore …


Scalable Data Analytics For Relational Databases, Graphs And Videos, Fubao Wu Jun 2022

Scalable Data Analytics For Relational Databases, Graphs And Videos, Fubao Wu

Doctoral Dissertations

Data analytics is to analyze raw data and mine insights, trends, and patterns from them. Due to the dramatic increase in data volume and size in recent years with the development of big data and cloud storage, big data analytics algorithms and techniques have been faced with more challenges. Moreover, there are various types of data formats, such as relational databases, text data, audio data, and image/video data. It is challenging to generate a unified framework or algorithm for data analytics on various data formats. Different data formats still need refined and scalable algorithms. In this dissertation, we explore three …


Practical Methods For High-Dimensional Data Publication With Differential Privacy, Ryan H. Mckenna Jun 2022

Practical Methods For High-Dimensional Data Publication With Differential Privacy, Ryan H. Mckenna

Doctoral Dissertations

In recent years, differential privacy has seen significant growth, and has been widely embraced as the dominant privacy definition by the research community. Much progress has been made on designing theoretically principled and practically sound privacy mechanisms. There have even been some real-world deployments of differential privacy, although it has not yet seen widespread adoption. One challenge is that for some problems, there is a gap between the privacy budget required to have a meaningful privacy guarantee and to retain data utility. A second challenge is that many privacy mechanisms have trouble scaling to high-dimensional data, limiting their applicability to …


Nonparametric Contextual Reasoning For Question Answering Over Large Knowledge Bases, Rajarshi Das Jun 2022

Nonparametric Contextual Reasoning For Question Answering Over Large Knowledge Bases, Rajarshi Das

Doctoral Dissertations

Question answering (QA) over knowledge bases provides a user-friendly way of accessing the massive amount of information stored in them. We have experienced tremendous progress in the performance of QA systems, thanks to the recent advancements in representation learning by deep neural models. However, such deep models function as black boxes with an opaque reasoning process, are brittle, and offer very limited control (e.g. for debugging an erroneous model prediction). It is also unclear how to reliably add or update knowledge stored in their model parameters. This thesis proposes nonparametric models for question answering that disentangle logic from knowledge. For …


Enhancing Usability And Explainability Of Data Systems, Anna Fariha Oct 2021

Enhancing Usability And Explainability Of Data Systems, Anna Fariha

Doctoral Dissertations

The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, …


History Modeling For Conversational Information Retrieval, Chen Qu Oct 2021

History Modeling For Conversational Information Retrieval, Chen Qu

Doctoral Dissertations

Conversational search is an embodiment of an iterative and interactive approach to information retrieval (IR) that has been studied for decades. Due to the recent rise of intelligent personal assistants, such as Siri, Alexa, AliMe, Cortana, and Google Assistant, a growing part of the population is moving their information-seeking activities to voice- or text-based conversational interfaces. One of the major challenges of conversational search is to leverage the conversation history to understand and fulfill the users' information needs. In this dissertation work, we investigate history modeling approaches for conversational information retrieval. We start from history modeling for user intent prediction. …


Enabling Declarative And Scalable Prescriptive Analytics In Relational Data, Matteo Brucato Oct 2021

Enabling Declarative And Scalable Prescriptive Analytics In Relational Data, Matteo Brucato

Doctoral Dissertations

Constrained optimization problems are at the heart of significant applications in a broad range of domains, including finance, transportation, manufacturing, and healthcare. They are often found at the final step of business analytics, namely prescriptive analytics, to allow businesses to transform a rich understanding of data, typically provided by advanced predictive models, into actionable decisions. Modeling and solving these problems has relied on application-specific solutions, which are often complex, error-prone, and do not generalize. Our goal is to create a domain-independent, declarative approach, supported and powered by the system where the data relevant to these problems typically resides: the database. …


Neural Approaches To Feedback In Information Retrieval, Keping Bi Oct 2021

Neural Approaches To Feedback In Information Retrieval, Keping Bi

Doctoral Dissertations

Relevance feedback on search results indicates users' search intent and preferences. Extensive studies have shown that incorporating relevance feedback (RF) on the top k (usually 10) ranked results significantly improves the performance of re-ranking. However, most existing research on user feedback focuses on words-based retrieval models. Recently, neural retrieval models have shown their efficacy in capturing relevance matching in retrieval but little research has been conducted on neural approaches to feedback. This leads us to study different aspects of feedback with neural approaches in the dissertation. RF techniques are seldom used in real search scenarios since they can require significant …


Towards Practical Differentially Private Mechanism Design And Deployment, Dan Zhang Jul 2021

Towards Practical Differentially Private Mechanism Design And Deployment, Dan Zhang

Doctoral Dissertations

As the collection of personal data has increased, many institutions face an urgent need for reliable protection of sensitive data. Among the emerging privacy protection mechanisms, differential privacy offers a persuasive and provable assurance to individuals and has become the dominant model in the research community. However, despite growing adoption, the complexity of designing differentially private algorithms and effectively deploying them in real-world applications remains high. In this thesis, we address two main questions: 1) how can we aid programmers in developing private programs with high utility? and 2) how can we deploy differentially private algorithms to visual analytics systems? …


Neural Methods For Answer Passage Retrieval Over Sparse Collections, Daniel Cohen Apr 2021

Neural Methods For Answer Passage Retrieval Over Sparse Collections, Daniel Cohen

Doctoral Dissertations

Recent advances in machine learning have allowed information retrieval (IR) techniques to advance beyond the stage of handcrafting domain specific features. Specifically, deep neural models incorporate varying levels of features to learn whether a document answers the information need of a query. However, these neural models rely on a large number of parameters to successfully learn a relation between a query and a relevant document.

This reliance on a large number of parameters, combined with the current methods of optimization relying on small updates necessitates numerous samples to allow the neural model to converge on an effective relevance function. This …


Quantifying The Impact Of Non-Stationarity In Reinforcement Learning-Based Traffic Signal Control, Lucas N. Alegre, Ana L.C. Bazzan, Bruno C. Da Silva Jan 2021

Quantifying The Impact Of Non-Stationarity In Reinforcement Learning-Based Traffic Signal Control, Lucas N. Alegre, Ana L.C. Bazzan, Bruno C. Da Silva

Computer Science Department Faculty Publication Series

In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken in other parts of a network. In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent. More precisely, we study both the effects of changing …


Cloud And Edge Computation Offloading For Latency Limited Services, Ivana Kovacevic, Erkki Harjula, Savo Glisic, Beatriz Lorenzo, Mika Ylianttila Jan 2021

Cloud And Edge Computation Offloading For Latency Limited Services, Ivana Kovacevic, Erkki Harjula, Savo Glisic, Beatriz Lorenzo, Mika Ylianttila

Electrical and Computer Engineering Faculty Publication Series

Multi-access Edge Computing (MEC) is recognised as a solution in future networks to offload computation and data storage from mobile and IoT devices to the servers at the edge of mobile networks. It reduces the network traffic and service latency compared to passing all data to cloud data centers while offering greater processing power than handling tasks locally at terminals. Since MEC servers are scattered throughout the radio access network, their computation capacities are modest in comparison to large cloud data centers. Therefore, offloading decision between MEC and cloud server should minimize the usage of the resources while maximizing the …


Neural Models For Information Retrieval Without Labeled Data, Hamed Zamani Oct 2019

Neural Models For Information Retrieval Without Labeled Data, Hamed Zamani

Doctoral Dissertations

Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or \emph{weakly supervised} solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, …


Response Retrieval In Information-Seeking Conversations, Liu Yang Oct 2019

Response Retrieval In Information-Seeking Conversations, Liu Yang

Doctoral Dissertations

The increasing popularity of mobile Internet has led to several crucial changes in the way that people use search engines compared with traditional Web search on desktops. On one hand, there is limited output bandwidth with the small screen sizes of most mobile devices. Mobile Internet users prefer direct answers on the search engine result page (SERP). On the other hand, voice-based / text-based conversational interfaces are becoming increasing popular as shown in the wide adoption of intelligent assistant services and devices such as Amazon Echo, Microsoft Cortana and Google Assistant around the world. These important changes have triggered several …


Neural Generative Models And Representation Learning For Information Retrieval, Qingyao Ai Oct 2019

Neural Generative Models And Representation Learning For Information Retrieval, Qingyao Ai

Doctoral Dissertations

Information Retrieval (IR) concerns about the structure, analysis, organization, storage, and retrieval of information. Among different retrieval models proposed in the past decades, generative retrieval models, especially those under the statistical probabilistic framework, are one of the most popular techniques that have been widely applied to Information Retrieval problems. While they are famous for their well-grounded theory and good empirical performance in text retrieval, their applications in IR are often limited by their complexity and low extendability in the modeling of high-dimensional information. Recently, advances in deep learning techniques provide new opportunities for representation learning and generative models for information …


Probabilistic Models For Identifying And Explaining Controversy, Myungha Jang Jul 2019

Probabilistic Models For Identifying And Explaining Controversy, Myungha Jang

Doctoral Dissertations

Navigating controversial topics on the Web encourages social awareness, supports civil discourse, and promotes critical literacy. While search of controversial topics particularly requires users to use their critical literacy skills on the content, educating people to be more critical readers is known to be a complex and long-term process. Therefore, we are in need of search engines that are equipped with techniques to help users to understand controversial topics by identifying them and explaining why they are controversial. A few approaches for identifying controversy have worked reasonably well in practice, but they are narrow in scope and exhibit limited performance. …


The River Process Corridor: A Modular River Assessment Method Based On Process Units And Widely Available Data In The Northeast Us., John D. Gartner, Christine E. Hatch, Eve Vogel, Et. Al. Jan 2019

The River Process Corridor: A Modular River Assessment Method Based On Process Units And Widely Available Data In The Northeast Us., John D. Gartner, Christine E. Hatch, Eve Vogel, Et. Al.

Water Reports

We define the river process corridor (RPC) as the area adjacent to a river that is likely to affect and be affected by river and floodplain processes. Here we present a novel approach for delineating the RPC that utilizes widely available geospatial data, can be applied uniformly across broad and multi-scalar spatial extents, requires relatively low levels of expertise and cost, and allows for modular additions and adaptations using additional data that is available in particular areas. Land managers are increasingly using a variety of delineated river and floodplain areas for applied purposes such as hazard avoidance, ecological conservation, and …


Supporting Scientific Analytics Under Data Uncertainty And Query Uncertainty, Liping Peng Mar 2018

Supporting Scientific Analytics Under Data Uncertainty And Query Uncertainty, Liping Peng

Doctoral Dissertations

Data management is becoming increasingly important in many applications, in particular, in large scientific databases where (1) data can be naturally modeled by continuous random variables, and (2) queries can involve complex predicates and/or be difficult for users to express explicitly. My thesis work aims to provide efficient support to both the "data uncertainty" and the "query uncertainty". When data is uncertain, an important class of queries requires query answers to be returned if their existence probabilities pass a threshold. I start with optimizing such threshold query processing for continuous uncertain data in the relational model by (i) expediting selections …


Database Usability Enhancement In Data Exploration, Yue Wang Nov 2017

Database Usability Enhancement In Data Exploration, Yue Wang

Doctoral Dissertations

Database usability has become an important research topic over the last decade. In the early days, database management systems were maintained by sophisticated users like database administrators. Today, due to the availability of data and computing resources, more non-expert users are involved in database computation. From their point of view, database systems lack ease of use. So researchers believe that usability is as important as the performance and functionality of databases and therefore developed many techniques such as natural language interface to enhance the ease of use of databases. In this thesis, we find some deeper technical issues in database …


Controversy Analysis And Detection, Shiri Dori-Hacohen Nov 2017

Controversy Analysis And Detection, Shiri Dori-Hacohen

Doctoral Dissertations

Seeking information on a controversial topic is often a complex task. Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the "filter bubble" effect, and therefore would be a useful feature in a search engine or browser extension. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results beyond a simple "list of 10 links". This thesis has made strides in the emerging niche of controversy detection and analysis. The body of work in this thesis revolves around two …


The Complexity Of Resilience, Cibele Matos Freire Nov 2017

The Complexity Of Resilience, Cibele Matos Freire

Doctoral Dissertations

One focus area in data management research is to understand how changes in the data can affect the output of a view or standing query. Example applications are explaining query results and propagating updates through views. In this thesis we study the complexity of the Resilience problem, which is the problem of finding the minimum number of tuples that need to be deleted from the database in order to change the result of a query. We will see that resilience is closely related to the well-studied problems of deletion propagation and causal responsibility, and that analyzing its complexity offers important …


Using Osgeo Solutions For Local Development Systems Implementation. The Experience For The Northern Region Of Costa Rica, López-Villegas Oscar, Víquez-Acuña Oscar, Víquez-Acuña Leonardo Sep 2017

Using Osgeo Solutions For Local Development Systems Implementation. The Experience For The Northern Region Of Costa Rica, López-Villegas Oscar, Víquez-Acuña Oscar, Víquez-Acuña Leonardo

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

Although some general definitions classify Spatial Data Infrastructures (SDI) as technological standards, institutional and even political agreements, which allow the discovery and use of geospatial information by users for different purposes [Kuhn 2005], computationally this platforms are valuable data repositories that should reach people efficiently and effectively for analysis and decision making on issues of collective interest. Costa Rica has several SDIs experiences at national level (SNIT - http://www.snitcr.go.cr), regional level (IDEHN - http://www.idehn.tec.ac.cr) or local/cantonal level (IDESCA - http://idesca.cr). Those infrastructures can facilitate access between geospatial information managers and their consumers through the implementation of particular software applications. The …


Kadaster Data Platform - Overview Archicture, Erwin Folmer, Wouter Beek Sep 2017

Kadaster Data Platform - Overview Archicture, Erwin Folmer, Wouter Beek

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

The Dutch Cadastre is publishing its geospatial data assets as Linked Open Data through the Kadaster Data Platform (KDP). The KDP supports the following three Linked Data browsing paradigms: (1) graph navigation, (2) hierarchical browsing, and (3) faceted browsing. Graph navigation uses the graph-shape of the RDF datamodel, to display concepts and instances as nodes, and properties between them as edges between those nodes. Graph navigation works well for explorative browsing. For graph navigation the KDP uses LODView (http://lodview.it), an existing OSS. Hierarchical browsing uses the tree structure of the concept hierarchy in order to display the various classes and …


Evaluation Of The Micro-Tasking Method For Openstreetmap Imports, Atle Frenvik Sveen, Anne Sofie Strand Erichsen Sep 2017

Evaluation Of The Micro-Tasking Method For Openstreetmap Imports, Atle Frenvik Sveen, Anne Sofie Strand Erichsen

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

Open Geospatial Data, capable of enriching OpenStreetMap, is being released by governments around the world at an increasing rate. The OSM import methods have been refined since the massive TIGER-import, moving towards assisted methods such as the with micro-tasking method used by the LA and NY buildings imports. While these imports serve as great case studies of imports, they do not deal with complex datasets, or updates to the data, neither do they deal with partitioning of tasks. We examine how the Norwegian FKB-dataset can be imported to OSM using micro-tasking, and perform a user-test to determine the best partition …


Towards A Web-Enabled Geo-Sample Web: An Open Source Resource Registration And Management System For Connecting Geo-Samples To The Web, Anusuriya Devaraju, Jens Klump, Victor Tey, Simon Cox, Ryan Fraser Sep 2017

Towards A Web-Enabled Geo-Sample Web: An Open Source Resource Registration And Management System For Connecting Geo-Samples To The Web, Anusuriya Devaraju, Jens Klump, Victor Tey, Simon Cox, Ryan Fraser

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

Within the earth sciences the curation and sharing of geo-samples is crucial to supporting reproducible research, in addition to extending the use of the samples in new research, and saving costs by avoiding sample loss and duplicating sampling activities. In the Commonwealth Scientific and Industrial Research Organisation (CSIRO), researchers gather various geo-samples as part of their field studies and collaborative projects. The diversity of the samples and their unsystematic management led ambiguous sample numbers, incomplete sample descriptions, and difficulties in finding the samples and their related data. These problems are also found in universities, research institutes and government agencies, which …


The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez Sep 2017

The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

With funding from the Sloan Foundation and Harvard Dataverse, the Harvard Center for Geographic Analysis (CGA) has developed a big spatio-temporal data visualization platform called the Billion Object Platform or "BOP". The goal of the project is to lower barriers for scholars who wish to access large, streaming, spatio-temporal datasets. Since once archived, streaming data gets big fast, and since most GIS systems don't support interactive visualization of millions of objects, a new platform was needed. The BOP is loaded with the latest billion geo-tweets and is fed a real-time stream of about 1 million tweets per day. The CGA …


Processing Conservation Indicators With Open Source Tools: Lessons Learned From The Digital Observatory For Protected Areas, Lucy Bastin, Andrea Mandrici, Luca Battistella, Grégoire Dubois Sep 2017

Processing Conservation Indicators With Open Source Tools: Lessons Learned From The Digital Observatory For Protected Areas, Lucy Bastin, Andrea Mandrici, Luca Battistella, Grégoire Dubois

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

The European Commission has a commitment to open data and the support of open source software and standards. We present lessons learnt while populating and supporting the web and map services that underly the Joint Research Centre's Digital Observatory for Protected Areas. Challenges include: large datasets with highly complex geometries; topological inconsistencies, compounded by reprojection for equal-area calculations; multiple different representations of the same geographical entities, for example coastlines; licensing requirement to continuously update indicators to respond to monthly changes in the authoritative data. In order to compute and publish an array of indicators, we used a range of open …


Optimizing Spatiotemporal Analysis Using Multidimensional Indexing With Geowave, Richard Fecher, Michael A. Whitby Sep 2017

Optimizing Spatiotemporal Analysis Using Multidimensional Indexing With Geowave, Richard Fecher, Michael A. Whitby

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

The open source software GeoWave bridges the gap between geographic information systems and distributed computing. This is done by preserving locality of multidimensional data when indexing it into a single-dimensional key-value store, using space filling curves. This means that like values in each dimension are stored physically close together in the datastore. We demonstrate the efficiencies and benefits of the GeoWave indexing algorithm to store and query billions of spatiotemporal data points. We show how this indexing strategy can be used to reduce query and processing times by multiple orders of magnitude using publicly available taxi trip data published by …