Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Information retrieval

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 46

Full-Text Articles in Physical Sciences and Mathematics

Non-Monotonic Generation Of Knowledge Paths For Context Understanding, Pei-Chi Lo, Ee-Peng Lim Mar 2024

Non-Monotonic Generation Of Knowledge Paths For Context Understanding, Pei-Chi Lo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Knowledge graphs can be used to enhance text search and access by augmenting textual content with relevant background knowledge. While many large knowledge graphs are available, using them to make semantic connections between entities mentioned in the textual content remains to be a difficult task. In this work, we therefore introduce contextual path generation (CPG) which refers to the task of generating knowledge paths, contextual path, to explain the semantic connections between entities mentioned in textual documents with given knowledge graph. To perform CPG task well, one has to address its three challenges, namely path relevance, incomplete knowledge graph, and …


Contextual Path Retrieval: A Contextual Entity Relation Embedding-Based Approach, Pei-Chi Lo, Ee-Peng Lim Jan 2023

Contextual Path Retrieval: A Contextual Entity Relation Embedding-Based Approach, Pei-Chi Lo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Contextual path retrieval (CPR) refers to the task of finding contextual path(s) between a pair of entities in a knowledge graph that explains the connection between them in a given context. For this novel retrieval task, we propose the Embedding-based Contextual Path Retrieval (ECPR) framework. ECPR is based on a three-component structure that includes a context encoder and path encoder that encode query context and path, respectively, and a path ranker that assigns a ranking score to each candidate path to determine the one that should be the contextual path. For context encoding, we propose two novel context encoding methods, …


Evaluation Of Geo-Spebh Algorithm Based On Bandwidth For Big Data Retrieval In Cloud Computing, Abubakar Usman Othman, Moses Timothy, Aisha Yahaya Umar, Abdullahi Salihu Audu, Boukari Souley, Abdulsalam Ya’U Gital Sep 2022

Evaluation Of Geo-Spebh Algorithm Based On Bandwidth For Big Data Retrieval In Cloud Computing, Abubakar Usman Othman, Moses Timothy, Aisha Yahaya Umar, Abdullahi Salihu Audu, Boukari Souley, Abdulsalam Ya’U Gital

Al-Bahir Journal for Engineering and Pure Sciences

The fast increase in volume and speed of information created by mobile devices, along with the availability of web-based applications, has considerably contributed to the massive collection of data. Approximate Nearest Neighbor (ANN) is essential in big size databases for comparison search to offer the nearest neighbor of a given query in the field of computer vision and pattern recognition. Many hashing algorithms have been developed to improve data management and retrieval accuracy in huge databases. However, none of these algorithms took bandwidth into consideration, which is a significant aspect in information retrieval and pattern recognition. As a result, our …


Structure-Aware Visualization Retrieval, Haotian Li, Yong Wang, Aoyu Wu, Huan Wei, Huamin. Qu May 2022

Structure-Aware Visualization Retrieval, Haotian Li, Yong Wang, Aoyu Wu, Huan Wei, Huamin. Qu

Research Collection School Of Computing and Information Systems

With the wide usage of data visualizations, a huge number of Scalable Vector Graphic (SVG)-based visualizations have been created and shared online. Accordingly, there has been an increasing interest in exploring how to retrieve perceptually similar visualizations from a large corpus, since it can benefit various downstream applications such as visualization recommendation. Existing methods mainly focus on the visual appearance of visualizations by regarding them as bitmap images. However, the structural information intrinsically existing in SVG-based visualizations is ignored. Such structural information can delineate the spatial and hierarchical relationship among visual elements, and characterize visualizations thoroughly from a new perspective. …


Codematcher: Searching Code Based On Sequential Semantics Of Important Query Words, Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, Shanping Li Jan 2022

Codematcher: Searching Code Based On Sequential Semantics Of Important Query Words, Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, Shanping Li

Research Collection School Of Computing and Information Systems

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR)-based models for code search, but they fail to connect the semantic gap between query and code. An early successful deep learning (DL)-based model DeepCS solved this issue by learning the relationship between pairs of code methods and corresponding natural language descriptions. Two major advantages of DeepCS are the capability of understanding irrelevant/noisy keywords and capturing sequential relationships between words in query and code. In this article, we proposed an IR-based model CodeMatcher that …


Exploratory Search With Archetype-Based Language Models, Brent D. Davis Aug 2021

Exploratory Search With Archetype-Based Language Models, Brent D. Davis

Electronic Thesis and Dissertation Repository

This dissertation explores how machine learning, natural language processing and information retrieval may assist the exploratory search task. Exploratory search is a search where the ideal outcome of the search is unknown, and thus the ideal language to use in a retrieval query to match it is unavailable. Three algorithms represent the contribution of this work. Archetype-based Modeling and Search provides a way to use previously identified archetypal documents relevant to an archetype to form a notion of similarity and find related documents that match the defined archetype. This is beneficial for exploratory search as it can generalize beyond standard …


Neural Methods For Answer Passage Retrieval Over Sparse Collections, Daniel Cohen Apr 2021

Neural Methods For Answer Passage Retrieval Over Sparse Collections, Daniel Cohen

Doctoral Dissertations

Recent advances in machine learning have allowed information retrieval (IR) techniques to advance beyond the stage of handcrafting domain specific features. Specifically, deep neural models incorporate varying levels of features to learn whether a document answers the information need of a query. However, these neural models rely on a large number of parameters to successfully learn a relation between a query and a relevant document.

This reliance on a large number of parameters, combined with the current methods of optimization relying on small updates necessitates numerous samples to allow the neural model to converge on an effective relevance function. This …


Building And Using Digital Libraries For Etds, Edward A. Fox Mar 2021

Building And Using Digital Libraries For Etds, Edward A. Fox

The Journal of Electronic Theses and Dissertations

Despite the high value of electronic theses and dissertations (ETDs), the global collection has seen limited use. To extend such use, a new approach to building digital libraries (DLs) is needed. Fortunately, recent decades have seen that a vast amount of “gray literature” has become available through a diverse set of institutional repositories as well as regional and national libraries and archives. Most of the works in those collections include ETDs and are often freely available in keeping with the open-access movement, but such access is limited by the services of supporting information systems. As explained through a set of …


Neural Generative Models And Representation Learning For Information Retrieval, Qingyao Ai Oct 2019

Neural Generative Models And Representation Learning For Information Retrieval, Qingyao Ai

Doctoral Dissertations

Information Retrieval (IR) concerns about the structure, analysis, organization, storage, and retrieval of information. Among different retrieval models proposed in the past decades, generative retrieval models, especially those under the statistical probabilistic framework, are one of the most popular techniques that have been widely applied to Information Retrieval problems. While they are famous for their well-grounded theory and good empirical performance in text retrieval, their applications in IR are often limited by their complexity and low extendability in the modeling of high-dimensional information. Recently, advances in deep learning techniques provide new opportunities for representation learning and generative models for information …


Fostering The Retrieval Of Suitable Web Resources In Response To Children's Educational Search Tasks, Oghenemaro Deborah Anuyah Aug 2018

Fostering The Retrieval Of Suitable Web Resources In Response To Children's Educational Search Tasks, Oghenemaro Deborah Anuyah

Boise State University Theses and Dissertations

Children regularly turn to search engines (SEs) to locate school-related materials. Unfortunately, research has shown that when utilizing SEs, children do not always access resources that specifically target them. To support children, popular and child-oriented SEs make available a safe search filter, which is meant to eliminate inappropriate resources. Safe search is, however, not always the perfect deterrent as pornographic and hate-based resources may slip through the filter, while resources relevant to an educational search context may be misconstrued and filtered out. Moreover, filtering inappropriate resources in response to children searches is just one perspective to consider in offering them …


On The Effectiveness Of Virtualization Based Memory Isolation On Multicore Platforms, Siqi Zhao, Xuhua Ding Apr 2017

On The Effectiveness Of Virtualization Based Memory Isolation On Multicore Platforms, Siqi Zhao, Xuhua Ding

Research Collection School Of Computing and Information Systems

Virtualization based memory isolation has beenwidely used as a security primitive in many security systems.This paper firstly provides an in-depth analysis of itseffectiveness in the multicore setting; a first in the literature.Our study reveals that memory isolation by itself is inadequatefor security. Due to the fundamental design choices inhardware, it faces several challenging issues including pagetable maintenance, address mapping validation and threadidentification. As demonstrated by our attacks implementedon XMHF and BitVisor, these issues undermine the security ofmemory isolation. Next, we propose a new isolation approachthat is immune to the aforementioned problems. In our design,the hypervisor constructs a fully isolated micro …


Identifying Relationships Between Scientific Datasets, Abdussalam Alawini May 2016

Identifying Relationships Between Scientific Datasets, Abdussalam Alawini

Dissertations and Theses

Scientific datasets associated with a research project can proliferate over time as a result of activities such as sharing datasets among collaborators, extending existing datasets with new measurements, and extracting subsets of data for analysis. As such datasets begin to accumulate, it becomes increasingly difficult for a scientist to keep track of their derivation history, which complicates data sharing, provenance tracking, and scientific reproducibility. Understanding what relationships exist between datasets can help scientists recall their original derivation history. For instance, if dataset A is contained in dataset B, then the connection between A and B could be that A was …


A Cooperative Coevolution Framework For Parallel Learning To Rank, Shuaiqiang Wang, Yun Wu, Byron J. Gao, Ke Wang, Hady W. Lauw, Jun Ma Dec 2015

A Cooperative Coevolution Framework For Parallel Learning To Rank, Shuaiqiang Wang, Yun Wu, Byron J. Gao, Ke Wang, Hady W. Lauw, Jun Ma

Research Collection School Of Computing and Information Systems

We propose CCRank, the first parallel framework for learning to rank based on evolutionary algorithms (EA), aiming to significantly improve learning efficiency while maintaining accuracy. CCRank is based on cooperative coevolution (CC), a divide-and-conquer framework that has demonstrated high promise in function optimization for problems with large search space and complex structures. Moreover, CC naturally allows parallelization of sub-solutions to the decomposed sub-problems, which can substantially boost learning efficiency. With CCRank, we investigate parallel CC in the context of learning to rank. We implement CCRank with three EA-based learning to rank algorithms for demonstration. Extensive experiments on benchmark datasets in …


Information Filtering By Multiple Examples, Mingzhu Zhu May 2015

Information Filtering By Multiple Examples, Mingzhu Zhu

Dissertations

A key to successfully satisfy an information need lies in how users express it using keywords as queries. However, for many users, expressing their information needs using keywords is difficult, especially when the information need is complex. Search By Multiple Examples (SBME), a promising method for overcoming this problem, allows users to specify their information needs as a set of relevant documents rather than as a set of keywords.

Most of the studies on SBME adopt the Positive Unlabeled learning (PU learning) techniques by treating the user's provided examples (denoted as query examples) as positive set and the entire data …


The Symbiotic Relationship Between Information Retrieval And Informetrics, Dietmar Wolfram Mar 2015

The Symbiotic Relationship Between Information Retrieval And Informetrics, Dietmar Wolfram

School of Information Studies Faculty Articles

Informetrics and information retrieval (IR) represent fundamental areas of study within information science. Historically, researchers have not fully capitalized on the potential research synergies that exist between these two areas. Data sources used in traditional informetrics studies have their analogues in IR, with similar types of empirical regularities found in IR system content and use. Methods for data collection and analysis used in informetrics can help to inform IR system development and evaluation. Areas of application have included automatic indexing, index term weighting and understanding user query and session patterns through the quantitative analysis of user transaction logs. Similarly, developments …


The Symbiotic Relationship Between Information Retrieval And Informetrics, Dietmar Wolfram Jan 2015

The Symbiotic Relationship Between Information Retrieval And Informetrics, Dietmar Wolfram

Dietmar Wolfram

Informetrics and information retrieval (IR) represent fundamental areas of study within information science. Historically, researchers have not fully capitalized on the potential research synergies that exist between these two areas. Data sources used in traditional informetrics studies have their analogues in IR, with similar types of empirical regularities found in IR system content and use. Methods for data collection and analysis used in informetrics can help to inform IR system development and evaluation. Areas of application have included automatic indexing, index term weighting and understanding user query and session patterns through the quantitative analysis of user transaction logs. Similarly, developments …


The Partial Evaluation Approach To Information Personalization, Naren Ramakrishnan, Saverio Perugini Dec 2014

The Partial Evaluation Approach To Information Personalization, Naren Ramakrishnan, Saverio Perugini

Saverio Perugini

Information personalization refers to the automatic adjustment of information content, structure, and presentation tailored to an individual user. By reducing information overload and customizing information access, personalization systems have emerged as an important segment of the Internet economy. This paper presents a systematic modeling methodology— PIPE (‘Personalization is Partial Evaluation’) — for personalization. Personalization systems are designed and implemented in PIPE by modeling an information-seeking interaction in a programmatic representation. The representation supports the description of information-seeking activities as partial information and their subsequent realization by partial evaluation, a technique for specializing programs. We describe the modeling methodology at a …


Implementation Of A Segmented, Transactional Database Caching System, Benjamin J. Sandmann Aug 2014

Implementation Of A Segmented, Transactional Database Caching System, Benjamin J. Sandmann

Journal of Undergraduate Research at Minnesota State University, Mankato

Research on algorithms and concepts regarding memory-based data caching can help solve the performance bottleneck in current Database Management Systems. Problems such as data concurrency, persistent storage, and transaction management have limited most memory cache’s capabilities. It has also been tough to develop a proper user- oriented and business friendly way of implementing such a system. The research of this project focused on code implementation, abstract methodologies and how to best prepare such an application for common business usage.


Search Queries In An Information Retrieval System For Arabic-Language Texts, Zainab Majeed Albujasim Jan 2014

Search Queries In An Information Retrieval System For Arabic-Language Texts, Zainab Majeed Albujasim

Theses and Dissertations--Computer Science

Information retrieval aims to extract from a large collection of data a subset of information that is relevant to user’s needs. In this study, we are interested in information retrieval in Arabic-Language text documents. We focus on the Arabic language, its morphological features that potentially impact the implementation and performance of an information retrieval system and its unique characters that are absent in the Latin alphabet and require specialized approaches. Specifically, we report on the design, implementation and evaluation of the search functionality using the Vector Space Model with several weighting schemes. Our implementation uses the ISRI stemming algorithms as …


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


Query-Document-Dependent Fusion: A Case Study Of Multimodal Music Retrieval, Zhonghua Li, Bingjun Zhang, Yi Yu, Jialie Shen, Ye Wang Dec 2013

Query-Document-Dependent Fusion: A Case Study Of Multimodal Music Retrieval, Zhonghua Li, Bingjun Zhang, Yi Yu, Jialie Shen, Ye Wang

Research Collection School Of Computing and Information Systems

In recent years, multimodal fusion has emerged as a promising technology for effective multimedia retrieval. Developing the optimal fusion strategy for different modality (e.g. content, metadata) has been the subject of intensive research. Given a query, existing methods derive a unified fusion strategy for all documents with the underlying assumption that the relative significance of a modality remains the same across all documents. However, this assumption is often invalid. We thus propose a general multimodal fusion framework, query-document-dependent fusion (QDDF), which derives the optimal fusion strategy for each query-document pair via intelligent content analysis of both queries and documents. By …


Based On Repeated Experience, System For Modification Of Expression And Negating Overload From Media And Optimizing Referential Efficiency, Peter R. Badovinatz, Veronika M. Megler Jun 2013

Based On Repeated Experience, System For Modification Of Expression And Negating Overload From Media And Optimizing Referential Efficiency, Peter R. Badovinatz, Veronika M. Megler

Computer Science Faculty Publications and Presentations

Content items are revealed to a user based on whether they have been previously reviewed by the user. A number of content items are thus received over time. The content items may be discrete content items, or may be portions of a content stream, and may be received over different media. For each content item, it is determined whether the content item was previously reviewed by a user. Where the content item was not previously reviewed, the item is revealed to the user, such as by being displayed or announced to the user. Where the content item was previously reviewed, …


Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier May 2013

Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier

Computer Science Faculty Publications and Presentations

Large scientific repositories run the risk of losing value as their holdings expand, if it means increased effort for a scientist to locate particular datasets of interest. We discuss the challenges that scientists face in locating relevant data, and present our work in applying Information Retrieval techniques to dataset search, as embodied in the Data Near Here application.


K-Partite Graph Reinforcement And Its Application In Multimedia Information Retrieval, Yue Gao, Meng Wang, Rongrong Ji, Zheng-Jun Zha, Jialie Shen Jul 2012

K-Partite Graph Reinforcement And Its Application In Multimedia Information Retrieval, Yue Gao, Meng Wang, Rongrong Ji, Zheng-Jun Zha, Jialie Shen

Research Collection School Of Computing and Information Systems

In many example-based information retrieval tasks, example query actually contains multiple sub-queries. For example, in 3D object retrieval, the query is an object described by multiple views. In content-based video retrieval, the query is a video clip that contains multiple frames. Without prior knowledge, the most intuitive approach is to treat the sub-queries equally without difference. In this paper, we propose a k-partite graph reinforcement approach to fuse these sub-queries based on the to-be-retrieved database. The approach first collects the top retrieved results. These results are regarded as pseudo-relevant samples and then a k-partite graph reinforcement is performed on these …


Suffix Trees For Document Retrieval, Ryan Reck Jun 2012

Suffix Trees For Document Retrieval, Ryan Reck

Master's Theses

This thesis presents a look at the suitability of Suffix Trees for full text indexing and retrieval. Typically suffix trees are built on a character level, where the tree records which characters follow each other character. By building suffix trees for documents based on words instead of characters, the resulting tree effectively indexes every word or sequence of words that occur in any of the documents. Ukkonnen's algorithm is adapted to build word-level suffix trees. But the primary focus is on developing Algorithms for searching the suffix tree for exact and approximate, or fuzzy, matches to arbitrary query strings. A …


Parallel Learning To Rank For Information Retrieval, Shuaiqiang Wang, Byron J. Gao, Ke Wang, Hady W. Lauw Jul 2011

Parallel Learning To Rank For Information Retrieval, Shuaiqiang Wang, Byron J. Gao, Ke Wang, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Learning to rank represents a category of effective ranking methods for information retrieval. While the primary concern of existing research has been accuracy, learning efficiency is becoming an important issue due to the unprecedented availability of large-scale training data and the need for continuous update of ranking functions. In this paper, we investigate parallel learning to rank, targeting simultaneous improvement in accuracy and efficiency.


Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis Dec 2010

Continuous Nearest Neighbor Monitoring In Road Networks, Kyriakos Mouratidis, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis

Kyriakos MOURATIDIS

Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study k-NN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them. We propose two methods that can handle arbitrary object and query moving patterns, as well as °uctuations of edge weights. The ¯rst one maintains the query results by processing only updates that may invalidate …


Merging Schemas In A Collaborative Faceted Classification System, Jianxiang Li Aug 2010

Merging Schemas In A Collaborative Faceted Classification System, Jianxiang Li

Computer Science Theses & Dissertations

We have developed a system that improves access to a large, growing image collection by allowing users to collaboratively build a global faceted (multi-perspective) classification schema. We are extending our system to support both global and local schemas, where global schema provides a complete and uniform view of the collection whereas local schema provides a personal, possibly incomplete and idiosyncratic view of the collection. We argue that although users usually focus on their personal schemas, it is still desirable to have a global schema for the entire collection even if such local schemas are available. In order to keep the …


Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni Jan 2009

Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni

UNLV Theses, Dissertations, Professional Papers, and Capstones

Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator.

For the thesis, we evaluate the effects a similarity function may have on clustering. We start by representing a document and a query, both as a vector of high-dimensional space corresponding to the keywords followed by using an appropriate distance measure in k-means to compute similarity between the document vector and the query vector to …


An Infrastructure For Performance Measurement And Comparison Of Information Retrieval Solutions, Gary Saunders Aug 2008

An Infrastructure For Performance Measurement And Comparison Of Information Retrieval Solutions, Gary Saunders

Theses and Dissertations

The amount of information available on both public and private networks continues to grow at a phenomenal rate. This information is contained within a wide variety of objects, including documents, e-mail archives, medical records, manuals, pictures and music. To be of any value, this data must be easily searchable and accessible. Information Retrieval (IR) is concerned with the ability to find and gain access to relevant information. As electronic data repositories continue to proliferate, so too, grows the variety of methods used to locate and access the information contained therein. Similarly, the introduction of innovative retrieval strategies—and the optimization of …