Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Indexing

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 13 of 13

Full-Text Articles in Databases and Information Systems

Evaluation Of Geo-Spebh Algorithm Based On Bandwidth For Big Data Retrieval In Cloud Computing, Abubakar Usman Othman, Moses Timothy, Aisha Yahaya Umar, Abdullahi Salihu Audu, Boukari Souley, Abdulsalam Ya’U Gital Sep 2022

Evaluation Of Geo-Spebh Algorithm Based On Bandwidth For Big Data Retrieval In Cloud Computing, Abubakar Usman Othman, Moses Timothy, Aisha Yahaya Umar, Abdullahi Salihu Audu, Boukari Souley, Abdulsalam Ya’U Gital

Al-Bahir Journal for Engineering and Pure Sciences

The fast increase in volume and speed of information created by mobile devices, along with the availability of web-based applications, has considerably contributed to the massive collection of data. Approximate Nearest Neighbor (ANN) is essential in big size databases for comparison search to offer the nearest neighbor of a given query in the field of computer vision and pattern recognition. Many hashing algorithms have been developed to improve data management and retrieval accuracy in huge databases. However, none of these algorithms took bandwidth into consideration, which is a significant aspect in information retrieval and pattern recognition. As a result, our …


Approximate K-Nn Graph Construction: A Generic Online Approach, Wan-Lei Zhao, Hui Wang, Chong-Wah Ngo Jan 2022

Approximate K-Nn Graph Construction: A Generic Online Approach, Wan-Lei Zhao, Hui Wang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Nearest neighbor search and k-nearest neighbor graph construction are two fundamental issues that arise from many disciplines such as multimedia information retrieval, data-mining, and machine learning. They become more and more imminent given the big data emerge in various fields in recent years. In this paper, a simple but effective solution both for approximate k-nearest neighbor search and approximate k-nearest neighbor graph construction is presented. These two issues are addressed jointly in our solution. On one hand, the approximate k-nearest neighbor graph construction is treated as a search task. Each sample along with its k-nearest neighbors is joined into the …


Multilateration Index., Chip Lynch Aug 2021

Multilateration Index., Chip Lynch

Electronic Theses and Dissertations

We present an alternative method for pre-processing and storing point data, particularly for Geospatial points, by storing multilateration distances to fixed points rather than coordinates such as Latitude and Longitude. We explore the use of this data to improve query performance for some distance related queries such as nearest neighbor and query-within-radius (i.e. “find all points in a set P within distance d of query point q”). Further, we discuss the problem of “Network Adequacy” common to medical and communications businesses, to analyze questions such as “are at least 90% of patients living within 50 miles of a covered emergency …


Building And Using Digital Libraries For Etds, Edward A. Fox Mar 2021

Building And Using Digital Libraries For Etds, Edward A. Fox

The Journal of Electronic Theses and Dissertations

Despite the high value of electronic theses and dissertations (ETDs), the global collection has seen limited use. To extend such use, a new approach to building digital libraries (DLs) is needed. Fortunately, recent decades have seen that a vast amount of “gray literature” has become available through a diverse set of institutional repositories as well as regional and national libraries and archives. Most of the works in those collections include ETDs and are often freely available in keeping with the open-access movement, but such access is limited by the services of supporting information systems. As explained through a set of …


Building A Library Search Infrastructure With Elasticsearch, Kim Pham, Fernando Reyes, Jeff Rynhart May 2020

Building A Library Search Infrastructure With Elasticsearch, Kim Pham, Fernando Reyes, Jeff Rynhart

University Libraries: Faculty Scholarship

This article discusses our implementation of an Elastic cluster to address our search, search administration and indexing needs, how it integrates in our technology infrastructure, and finally takes a close look at the way that we built a reusable, dynamic search engine that powers our digital repository search. We cover the lessons learned with our early implementations and how to address them to lay the groundwork for a scalable, networked search environment that can also be applied to alternative search engines such as Solr.


Improved User News Feed Customization For An Open Source Search Engine, Timothy Chow May 2020

Improved User News Feed Customization For An Open Source Search Engine, Timothy Chow

Master's Projects

Yioop is an open source search engine project hosted on the site of the same name.It offers several features outside of searching, with one such feature being a news feed. The current news feed system aggregates articles from a curated list of news sites determined by the owner. However in its current state, the feed list is limited in size, constrained by the hardware that the aggregator is run on. The goal of my project was to overcome this limit by improving the current storage method used. The solution was derived by making use of IndexArchiveBundles and IndexShards, both of …


Indexable Bayesian Personalized Ranking For Efficient Top-K Recommendation, Dung D. Le, Hady W. Lauw Nov 2017

Indexable Bayesian Personalized Ranking For Efficient Top-K Recommendation, Dung D. Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Top-k recommendation seeks to deliver a personalized recommendation list of k items to a user. The dual objectives are (1) accuracy in identifying the items a user is likely to prefer, and (2) efficiency in constructing the recommendation list in real time. One direction towards retrieval efficiency is to formulate retrieval as approximate k nearest neighbor (kNN) search aided by indexing schemes, such as locality-sensitive hashing, spatial trees, and inverted index. These schemes, applied on the output representations of recommendation algorithms, speed up the retrieval process by automatically discarding a large number of potentially irrelevant items when given a user …


A Scalable Graph-Coarsening Based Index For Dynamic Graph Databases, Akshay Kansal Aug 2017

A Scalable Graph-Coarsening Based Index For Dynamic Graph Databases, Akshay Kansal

Boise State University Theses and Dissertations

Graph is a commonly used data structure for modeling complex data such as chemical molecules, images, social networks, and XML documents. This complex data is stored using a set of graphs, known as graph database D. To speed up query answering on graph databases, indexes are commonly used. State-of-the-art graph database indexes do not adapt or scale well to dynamic graph database use; they are static, and their ability to prune possible search responses to meet user needs worsens over time as databases change and grow. Users can re-mine indexes to gain some improvement, but it is time consuming. Users …


The Symbiotic Relationship Between Information Retrieval And Informetrics, Dietmar Wolfram Mar 2015

The Symbiotic Relationship Between Information Retrieval And Informetrics, Dietmar Wolfram

School of Information Studies Faculty Articles

Informetrics and information retrieval (IR) represent fundamental areas of study within information science. Historically, researchers have not fully capitalized on the potential research synergies that exist between these two areas. Data sources used in traditional informetrics studies have their analogues in IR, with similar types of empirical regularities found in IR system content and use. Methods for data collection and analysis used in informetrics can help to inform IR system development and evaluation. Areas of application have included automatic indexing, index term weighting and understanding user query and session patterns through the quantitative analysis of user transaction logs. Similarly, developments …


Symphony: A Platform For Search-Driven Applications, John C. Shafer, Rakesh Agrawal, Hady W. Lauw Mar 2010

Symphony: A Platform For Search-Driven Applications, John C. Shafer, Rakesh Agrawal, Hady W. Lauw

Research Collection School Of Computing and Information Systems

We present the design of Symphony, a platform that enables non-developers to build and deploy a new class of search-driven applications that combine their data and domain expertise with content from search engines and other web services. The Symphony prototype has been built on top of Microsoft's Bing infrastructure. While Symphony naturally makes use of the customization capabilities exposed by Bing, its distinguishing feature is the capability it provides to the application creator to combine their proprietary data and domain expertise with content obtained from Bing. They can also integrate specialized data obtained from web services to enhance the richness …


On Searching Continuous Nearest Neighbors In Wireless Data Broadcast Systems, Baihua Zheng, Wang-Chien Lee, Dik Lun Lee Jul 2007

On Searching Continuous Nearest Neighbors In Wireless Data Broadcast Systems, Baihua Zheng, Wang-Chien Lee, Dik Lun Lee

Research Collection School Of Computing and Information Systems

A continuous nearest neighbor (CNN) search, which retrieves the nearest neighbors corresponding to every point in a given query line segment, is important for location-based services such as vehicular navigation and tourist guides. It is infeasible to answer a CNN search by issuing a traditional nearest neighbor query at every point of the line segment due to the large number of queries generated and the overhead on bandwidth. Algorithms have been proposed recently to support CNN search in the traditional client-server systems but not in the environment of wireless data broadcast, where uplink communication channels from mobile devices to the …


An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun Aug 2006

An Energy-Efficient And Access Latency Optimized Indexing Scheme For Wireless Data Broadcast, Yuxia Yao, Xueyan Tang, Ee Peng Lim, Aixin Sun

Research Collection School Of Computing and Information Systems

Data broadcast is an attractive data dissemination method in mobile environments. To improve energy efficiency, existing air indexing schemes for data broadcast have focused on reducing tuning time only, i.e., the duration that a mobile client stays active in data accesses. On the other hand, existing broadcast scheduling schemes have aimed at reducing access latency through nonflat data broadcast to improve responsiveness only. Not much work has addressed the energy efficiency and responsiveness issues concurrently. This paper proposes an energy-efficient indexing scheme called MHash that optimizes tuning time and access latency in an integrated fashion. MHash reduces tuning time by …


Dsim: A Distance-Based Indexing Method For Genomic Sequences, Xia Cao, Beng-Chin Ooi, Hwee Hwa Pang, Kian-Lee Tan, Anthony K. H. Tung Oct 2005

Dsim: A Distance-Based Indexing Method For Genomic Sequences, Xia Cao, Beng-Chin Ooi, Hwee Hwa Pang, Kian-Lee Tan, Anthony K. H. Tung

Research Collection School Of Computing and Information Systems

In this paper, we propose a Distance-based Sequence Indexing Method (DSIM) for indexing and searching genome databases. Borrowing the idea of video compression, we compress the genomic sequence database around a set of automatically selected reference words, formed from high-frequency data substrings and substrings in past queries. The compression captures the distance of each non-reference word in the database to some reference word. At runtime, a query is processed by comparing its substrings with the compressed data strings, through their distances to the reference words. We also propose an efficient scheme to incrementally update the reference words and the compressed …