Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Doctoral Dissertations

Discipline
Institution
Keyword
Publication Year

Articles 1 - 28 of 28

Full-Text Articles in Databases and Information Systems

Neural Approaches For Language-Agnostic Search And Recommendation, Hamed Rezanejad Asl Bonab Oct 2022

Neural Approaches For Language-Agnostic Search And Recommendation, Hamed Rezanejad Asl Bonab

Doctoral Dissertations

There are significant efforts toward developing better neural approaches for information retrieval problems. However, the vast majority of these studies are conducted using English-only data. In fact, trends and statistics of non-English content and users on the Internet show exponential growth and that novel information retrieval systems need to be language-agnostic; they need to bridge the language barrier between users and content, leverage data from high-resource settings for lower-resourced settings, and be able to extend to new languages and local markets easily. To this end, we focus on search and recommendation as two vital components of information systems. We explore …


Scalable Data Analytics For Relational Databases, Graphs And Videos, Fubao Wu Jun 2022

Scalable Data Analytics For Relational Databases, Graphs And Videos, Fubao Wu

Doctoral Dissertations

Data analytics is to analyze raw data and mine insights, trends, and patterns from them. Due to the dramatic increase in data volume and size in recent years with the development of big data and cloud storage, big data analytics algorithms and techniques have been faced with more challenges. Moreover, there are various types of data formats, such as relational databases, text data, audio data, and image/video data. It is challenging to generate a unified framework or algorithm for data analytics on various data formats. Different data formats still need refined and scalable algorithms. In this dissertation, we explore three …


Practical Methods For High-Dimensional Data Publication With Differential Privacy, Ryan H. Mckenna Jun 2022

Practical Methods For High-Dimensional Data Publication With Differential Privacy, Ryan H. Mckenna

Doctoral Dissertations

In recent years, differential privacy has seen significant growth, and has been widely embraced as the dominant privacy definition by the research community. Much progress has been made on designing theoretically principled and practically sound privacy mechanisms. There have even been some real-world deployments of differential privacy, although it has not yet seen widespread adoption. One challenge is that for some problems, there is a gap between the privacy budget required to have a meaningful privacy guarantee and to retain data utility. A second challenge is that many privacy mechanisms have trouble scaling to high-dimensional data, limiting their applicability to …


Nonparametric Contextual Reasoning For Question Answering Over Large Knowledge Bases, Rajarshi Das Jun 2022

Nonparametric Contextual Reasoning For Question Answering Over Large Knowledge Bases, Rajarshi Das

Doctoral Dissertations

Question answering (QA) over knowledge bases provides a user-friendly way of accessing the massive amount of information stored in them. We have experienced tremendous progress in the performance of QA systems, thanks to the recent advancements in representation learning by deep neural models. However, such deep models function as black boxes with an opaque reasoning process, are brittle, and offer very limited control (e.g. for debugging an erroneous model prediction). It is also unclear how to reliably add or update knowledge stored in their model parameters. This thesis proposes nonparametric models for question answering that disentangle logic from knowledge. For …


Enhancing Usability And Explainability Of Data Systems, Anna Fariha Oct 2021

Enhancing Usability And Explainability Of Data Systems, Anna Fariha

Doctoral Dissertations

The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, …


History Modeling For Conversational Information Retrieval, Chen Qu Oct 2021

History Modeling For Conversational Information Retrieval, Chen Qu

Doctoral Dissertations

Conversational search is an embodiment of an iterative and interactive approach to information retrieval (IR) that has been studied for decades. Due to the recent rise of intelligent personal assistants, such as Siri, Alexa, AliMe, Cortana, and Google Assistant, a growing part of the population is moving their information-seeking activities to voice- or text-based conversational interfaces. One of the major challenges of conversational search is to leverage the conversation history to understand and fulfill the users' information needs. In this dissertation work, we investigate history modeling approaches for conversational information retrieval. We start from history modeling for user intent prediction. …


Enabling Declarative And Scalable Prescriptive Analytics In Relational Data, Matteo Brucato Oct 2021

Enabling Declarative And Scalable Prescriptive Analytics In Relational Data, Matteo Brucato

Doctoral Dissertations

Constrained optimization problems are at the heart of significant applications in a broad range of domains, including finance, transportation, manufacturing, and healthcare. They are often found at the final step of business analytics, namely prescriptive analytics, to allow businesses to transform a rich understanding of data, typically provided by advanced predictive models, into actionable decisions. Modeling and solving these problems has relied on application-specific solutions, which are often complex, error-prone, and do not generalize. Our goal is to create a domain-independent, declarative approach, supported and powered by the system where the data relevant to these problems typically resides: the database. …


Neural Approaches To Feedback In Information Retrieval, Keping Bi Oct 2021

Neural Approaches To Feedback In Information Retrieval, Keping Bi

Doctoral Dissertations

Relevance feedback on search results indicates users' search intent and preferences. Extensive studies have shown that incorporating relevance feedback (RF) on the top k (usually 10) ranked results significantly improves the performance of re-ranking. However, most existing research on user feedback focuses on words-based retrieval models. Recently, neural retrieval models have shown their efficacy in capturing relevance matching in retrieval but little research has been conducted on neural approaches to feedback. This leads us to study different aspects of feedback with neural approaches in the dissertation. RF techniques are seldom used in real search scenarios since they can require significant …


Towards Practical Differentially Private Mechanism Design And Deployment, Dan Zhang Jul 2021

Towards Practical Differentially Private Mechanism Design And Deployment, Dan Zhang

Doctoral Dissertations

As the collection of personal data has increased, many institutions face an urgent need for reliable protection of sensitive data. Among the emerging privacy protection mechanisms, differential privacy offers a persuasive and provable assurance to individuals and has become the dominant model in the research community. However, despite growing adoption, the complexity of designing differentially private algorithms and effectively deploying them in real-world applications remains high. In this thesis, we address two main questions: 1) how can we aid programmers in developing private programs with high utility? and 2) how can we deploy differentially private algorithms to visual analytics systems? …


Neural Methods For Answer Passage Retrieval Over Sparse Collections, Daniel Cohen Apr 2021

Neural Methods For Answer Passage Retrieval Over Sparse Collections, Daniel Cohen

Doctoral Dissertations

Recent advances in machine learning have allowed information retrieval (IR) techniques to advance beyond the stage of handcrafting domain specific features. Specifically, deep neural models incorporate varying levels of features to learn whether a document answers the information need of a query. However, these neural models rely on a large number of parameters to successfully learn a relation between a query and a relevant document.

This reliance on a large number of parameters, combined with the current methods of optimization relying on small updates necessitates numerous samples to allow the neural model to converge on an effective relevance function. This …


Neural Models For Information Retrieval Without Labeled Data, Hamed Zamani Oct 2019

Neural Models For Information Retrieval Without Labeled Data, Hamed Zamani

Doctoral Dissertations

Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or \emph{weakly supervised} solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, …


Response Retrieval In Information-Seeking Conversations, Liu Yang Oct 2019

Response Retrieval In Information-Seeking Conversations, Liu Yang

Doctoral Dissertations

The increasing popularity of mobile Internet has led to several crucial changes in the way that people use search engines compared with traditional Web search on desktops. On one hand, there is limited output bandwidth with the small screen sizes of most mobile devices. Mobile Internet users prefer direct answers on the search engine result page (SERP). On the other hand, voice-based / text-based conversational interfaces are becoming increasing popular as shown in the wide adoption of intelligent assistant services and devices such as Amazon Echo, Microsoft Cortana and Google Assistant around the world. These important changes have triggered several …


Neural Generative Models And Representation Learning For Information Retrieval, Qingyao Ai Oct 2019

Neural Generative Models And Representation Learning For Information Retrieval, Qingyao Ai

Doctoral Dissertations

Information Retrieval (IR) concerns about the structure, analysis, organization, storage, and retrieval of information. Among different retrieval models proposed in the past decades, generative retrieval models, especially those under the statistical probabilistic framework, are one of the most popular techniques that have been widely applied to Information Retrieval problems. While they are famous for their well-grounded theory and good empirical performance in text retrieval, their applications in IR are often limited by their complexity and low extendability in the modeling of high-dimensional information. Recently, advances in deep learning techniques provide new opportunities for representation learning and generative models for information …


Probabilistic Models For Identifying And Explaining Controversy, Myungha Jang Jul 2019

Probabilistic Models For Identifying And Explaining Controversy, Myungha Jang

Doctoral Dissertations

Navigating controversial topics on the Web encourages social awareness, supports civil discourse, and promotes critical literacy. While search of controversial topics particularly requires users to use their critical literacy skills on the content, educating people to be more critical readers is known to be a complex and long-term process. Therefore, we are in need of search engines that are equipped with techniques to help users to understand controversial topics by identifying them and explaining why they are controversial. A few approaches for identifying controversy have worked reasonably well in practice, but they are narrow in scope and exhibit limited performance. …


Supporting Scientific Analytics Under Data Uncertainty And Query Uncertainty, Liping Peng Mar 2018

Supporting Scientific Analytics Under Data Uncertainty And Query Uncertainty, Liping Peng

Doctoral Dissertations

Data management is becoming increasingly important in many applications, in particular, in large scientific databases where (1) data can be naturally modeled by continuous random variables, and (2) queries can involve complex predicates and/or be difficult for users to express explicitly. My thesis work aims to provide efficient support to both the "data uncertainty" and the "query uncertainty". When data is uncertain, an important class of queries requires query answers to be returned if their existence probabilities pass a threshold. I start with optimizing such threshold query processing for continuous uncertain data in the relational model by (i) expediting selections …


Database Usability Enhancement In Data Exploration, Yue Wang Nov 2017

Database Usability Enhancement In Data Exploration, Yue Wang

Doctoral Dissertations

Database usability has become an important research topic over the last decade. In the early days, database management systems were maintained by sophisticated users like database administrators. Today, due to the availability of data and computing resources, more non-expert users are involved in database computation. From their point of view, database systems lack ease of use. So researchers believe that usability is as important as the performance and functionality of databases and therefore developed many techniques such as natural language interface to enhance the ease of use of databases. In this thesis, we find some deeper technical issues in database …


Controversy Analysis And Detection, Shiri Dori-Hacohen Nov 2017

Controversy Analysis And Detection, Shiri Dori-Hacohen

Doctoral Dissertations

Seeking information on a controversial topic is often a complex task. Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the "filter bubble" effect, and therefore would be a useful feature in a search engine or browser extension. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results beyond a simple "list of 10 links". This thesis has made strides in the emerging niche of controversy detection and analysis. The body of work in this thesis revolves around two …


The Complexity Of Resilience, Cibele Matos Freire Nov 2017

The Complexity Of Resilience, Cibele Matos Freire

Doctoral Dissertations

One focus area in data management research is to understand how changes in the data can affect the output of a view or standing query. Example applications are explaining query results and propagating updates through views. In this thesis we study the complexity of the Resilience problem, which is the problem of finding the minimum number of tuples that need to be deleted from the database in order to change the result of a query. We will see that resilience is closely related to the well-studied problems of deletion propagation and causal responsibility, and that analyzing its complexity offers important …


High-Performance Complex Event Processing For Decision Analytics, Haopeng Zhang Jul 2017

High-Performance Complex Event Processing For Decision Analytics, Haopeng Zhang

Doctoral Dissertations

Complex Event Processing (CEP) systems are becoming increasingly popular in do- mains for decision analytics such as financial services, transportation, cluster monitoring, supply chain management, business process management, and health care. These systems collect or create high volumes event streams, and often require such event streams to be processed in real-time. To this end, CEP queries are applied for filtering, correlation, ag- gregation, and transformation, to derive high-level, actionable information. Tasks for CEP systems fall into two categories: passive monitoring and proactive monitoring. For passive monitoring, users know their exact needs and express them in CEP queries, then CEP engines …


Extending Faceted Search To The Open-Domain Web, Weize Kong Jul 2016

Extending Faceted Search To The Open-Domain Web, Weize Kong

Doctoral Dissertations

Faceted search enables users to navigate a multi-dimensional information space by combining keyword search with drill-down options in each facets. For example, when searching “computer monitor”' in an e-commerce site, users can select brands and monitor types from the the provided facets {“Samsung”, “Dell”, “Acer”, ...} and {“LET-Lit”, “LCD”, “OLED”, ...}. It has been used successfully for many vertical applications, including e-commerce and digital libraries. However, this idea is not well explored for general web search in an open-domain setting, even though it holds great potential for assisting multi-faceted queries and exploratory search. The goal of this work is to …


A Platform For Scalable Low-Latency Analytics Using Mapreduce, Boduo Li Aug 2015

A Platform For Scalable Low-Latency Analytics Using Mapreduce, Boduo Li

Doctoral Dissertations

Today, the ability to process "big data" has become crucial to the information needs of many enterprise businesses, scientific applications, and governments. Recently, there have been increasing needs of processing data that is not only "big" but also "fast". Here "fast data" refers to high-speed real-time and near real-time data streams, such as Twitter feeds, search query streams, click streams, impressions, and system logs. To handle both historical data and real-time data, many companies have to maintain multiple systems. However, recent real-world case studies show that maintaining multiple systems cause not only code duplication, but also intensive manual work to …


Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick Mar 2015

Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick

Doctoral Dissertations

Knowledge bases (KB) facilitate real world decision making by providing access to structured relational information that enables pattern discovery and semantic queries. Although there is a large amount of data available for populating a KB; the data must first be gathered and assembled. Traditionally, this integration is performed automatically by storing the output of an information extraction pipeline directly into a database as if this prediction were the ``truth.'' However, the resulting KB is often not reliable because (a) errors accumulate in the integration pipeline, and (b) they persist in the KB even after new information arrives that could rectify …


Privacy-Preserving Sanitization In Data Sharing, Wentian Lu Nov 2014

Privacy-Preserving Sanitization In Data Sharing, Wentian Lu

Doctoral Dissertations

In the era of big data, the prospect of analyzing, monitoring and investigating all sources of data starts to stand out in every aspect of our life. The benefit of such practices becomes concrete only when analysts or investigators have the information shared from data owners. However, privacy is one of the main barriers that disrupt the sharing behavior, due to the fear of disclosing sensitive information. This dissertation describes data sanitization methods that disguise the sensitive information before sharing a dataset and our criteria are always protecting privacy while preserving utility as much as possible. In particular, we provide …


Entity-Based Enrichment For Information Extraction And Retrieval, Jeffrey Dalton Aug 2014

Entity-Based Enrichment For Information Extraction And Retrieval, Jeffrey Dalton

Doctoral Dissertations

The goal of this work is to leverage cross-document entity relationships for improved understanding of queries and documents. We define an entity to be a thing or concept that exists in the world, such as a politician, a battle, a film, or a color. Entity-based enrichment (EBE) is a new expansion model for both queries and documents using features from similar entitymentions in the document collection and external knowledge resources. It uses task-specific features from entities beyond words that include: name aliases, fine-grained entity types, categories, and relationships to other entities. EBE addresses the problem of sparse or noisy local …


Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini Oct 2012

Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini

Doctoral Dissertations

Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of "Big Data" synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the …


A Visual Approach To Automated Text Mining And Knowledge Discovery, Andrey A. Puretskiy Dec 2010

A Visual Approach To Automated Text Mining And Knowledge Discovery, Andrey A. Puretskiy

Doctoral Dissertations

The focus of this dissertation has been on improving the non-negative tensor factorization technique of text mining. The improvements have been made in both pre-processing and post-processing stages, with the goal of making the non-negative tensor factorization algorithm accessible to the casual user. The improved implementation allows the user to construct and modify the contents of the tensor, experiment with relative term weights and trust measures, and experiment with the total number of algorithm output features. Non-negative tensor factorization output feature production is closely integrated with a visual post-processing tool, FutureLens, that allows the user to perform in depth analysis …


Asp -Pricing: A Black -Scholes Option Pricing Formulation, Chaitanya Singh Apr 2002

Asp -Pricing: A Black -Scholes Option Pricing Formulation, Chaitanya Singh

Doctoral Dissertations

The Applications Service Provider (ASP) arrangement has engendered a revolution in the area of corporate information technology (IT) by transforming software from a packaged off-the-shelf product to an on-line virtual service.

The focus of this study is to establish a sound mathematical foundation for evaluating software rental agreements (embedding exit flexibility) by incorporating a real options framework (based upon the Black-Scholes approach) into the traditional capital budgeting technique. The static discounted cash flow or net present value analysis may not adequately serve as a ‘barometer’ of outsourcing value due to its inherent weaknesses. On the other hand, the options approach …


The Dynamics Of Cyberspace: Examining And Modelling Online Social Structure, Brian S. Butler '89 Apr 1999

The Dynamics Of Cyberspace: Examining And Modelling Online Social Structure, Brian S. Butler '89

Doctoral Dissertations

It has been proposed that online social structures represent new forms of organizing which are fundamentally different from traditional social structures. However, while there is a growing body of empirical research that considers behavioral aspects of online activity, research on online social structure structural remains largely anecdotal. This work consists of three papers that combine previous studies of traditional social structures, empirical analysis of longitudinal data from a sample of Internet listservs, and computational modeling to examine the dynamics of social structure development in networked environments.

The first paper (Title: When is a Group not a Group: An Empirical Examination …