Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Digital libraries

Articles 1 - 21 of 21

Full-Text Articles in Databases and Information Systems

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel Jan 2022

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel

Computer Science Faculty Publications

The web began with a vision of, as stated by Tim Berners-Lee in 1991, “that much academic information should be freely available to anyone”. For many years, the development of the web and the development of digital libraries and other scholarly communications infrastructure proceeded in tandem. A milestone occurred in July, 1995, when the first issue of D-Lib Magazine was published as an online, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as …


Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng Jan 2022

Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng

Engineering Management & Systems Engineering Faculty Publications

A growing trend in requirements elicitation is the use of machine learning (ML) techniques to automate the cumbersome requirement handling process. This literature review summarizes and analyzes studies that incorporate ML and natural language processing (NLP) into demand elicitation. We answer the following research questions: (1) What requirement elicitation activities are supported by ML? (2) What data sources are used to build ML-based requirement solutions? (3) What technologies, algorithms, and tools are used to build ML-based requirement elicitation? (4) How to construct an ML-based requirements elicitation method? (5) What are the available tools to support ML-based requirements elicitation methodology? Keywords …


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


Creating A Reproducible Metadata Transformation Pipeline Using Technology Best Practices, Cara Key, Mike Waugh Apr 2018

Creating A Reproducible Metadata Transformation Pipeline Using Technology Best Practices, Cara Key, Mike Waugh

Digital Initiatives Symposium

Over the course of two years, a team of librarians and programmers from LSU Libraries migrated the 186 collections of the Louisiana Digital Library from OCLC's CONTENTdm platform over to the open-source Islandora platform.

Early in the process, the team understood the value of creating a reproducible metadata transformation pipeline, because there were so many unknowns at the beginning of the process along with the certainty that mistakes would be made. This presentation will describe how the team used innovative and collaborative tools, such as Trello, Ansible, Vagrant, VirtualBox, git and GitHub to accomplish the task.


Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2018

Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive's Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be …


Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle Jan 2018

Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preserving and replaying modern web pages in high-fidelity has become an increasingly difficult task due to the increased usage of JavaScript. Reliance on server-side rewriting alone results in live-leakage and or the inability to replay a page due to the preserved JavaScript performing an action not permissible from the archive. The current state-of-the-art high fidelity archival preservation and replay solutions rely on handcrafted client-side URL rewriting libraries specifically tailored for the archive, namely Webrecoder's and Pywb's wombat.js [12]. Web archives not utilizing client-side rewriting rely on server-side rewriting that misses URLs used in a manner not accounted for …


Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2017

Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First paragraph] A Composite Memento is an archived representation of a web page with all the page requisites such as images and stylesheets. All embedded resources have their own URIs, hence, they are archived independently. For a meaningful archival replay, it is important to load all the page requisites from the archive within the temporal neighborhood of the base HTML page. To achieve this goal, archival replay systems try to rewrite all the resource references to appropriate archived versions before serving HTML, CSS, or JS. However, an effective server-side URL rewriting is difficult when URLs are generated dynamically using JavaScript. …


Svmaud: Using Textual Information To Predict The Audience Level Of Written Works Using Support Vector Machines, Todd Will Jan 2014

Svmaud: Using Textual Information To Predict The Audience Level Of Written Works Using Support Vector Machines, Todd Will

Dissertations

Information retrieval systems should seek to match resources with the reading ability of the individual user; similarly, an author must choose vocabulary and sentence structures appropriate for his or her audience. Traditional readability formulas, including the popular Flesch-Kincaid Reading Age and the Dale-Chall Reading Ease Score, rely on numerical representations of text characteristics, including syllable counts and sentence lengths, to suggest audience level of resources. However, the author’s chosen vocabulary, sentence structure, and even the page formatting can alter the predicted audience level by several levels, especially in the case of digital library resources. For these reasons, the performance of …


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


Linked Data Demystified: Practical Efforts To Transform Contentdm Metadata For The Linked Data Cloud, Silvia B. Southwick, Cory K. Lampert Nov 2012

Linked Data Demystified: Practical Efforts To Transform Contentdm Metadata For The Linked Data Cloud, Silvia B. Southwick, Cory K. Lampert

Library Faculty Presentations

The library literature and events like the ALA Annual Conference have been inundated with presentations and articles on linked data. At UNLV Libraries, we understand the importance of linked data in helping to better service our users. We have designed and initiated a pilot project to apply linked data concepts to the practical task of transforming a sample set of our CONTENTdm digital collections data into future-oriented linked data. This presentation will outline rationale for beginning work in linked data and detail the phases we will undertake in the proof of concept project. We hope through this research experiment to …


Synchronization And Multiple Group Server Support For Kepler, K. Maly, M. Zubair, H. Siripuram, S. Zunjarwad, Yannis Manolopoulos (Ed.), Joaquim Filipe (Ed.), Panos Constantopoulos (Ed.), José Cordeiro (Ed.) Jan 2006

Synchronization And Multiple Group Server Support For Kepler, K. Maly, M. Zubair, H. Siripuram, S. Zunjarwad, Yannis Manolopoulos (Ed.), Joaquim Filipe (Ed.), Panos Constantopoulos (Ed.), José Cordeiro (Ed.)

Computer Science Faculty Publications

In the last decade literally thousands of digital libraries have emerged but one of the biggest obstacles for dissemination of information to a user community is that many digital libraries use different, proprietary technologies that inhibit interoperability. Kepler framework addresses interoperability and gives publication control to individual publishers. In Kepler, OAI-PMH is used to support "personal data providers" or "archivelets".". In our vision, individual publishers can be integrated with an institutional repository like Dspace by means of a Kepler Group Digital Library (GDL). The GDL aggregates metadata and full text from archivelets and can act as an OAI-compliant data provider …


Digital Libraries And Middleware Technology, Muhammad Umar Qasim Jan 2005

Digital Libraries And Middleware Technology, Muhammad Umar Qasim

Theses

Digital libraries deliver personalized knowledge directly to their users, without being restricted to the contents of a physical library. In a digital library information from any online source can be managed and shared, making more knowledge available to users than before. The information sharing is achieved by integrating many autonomous heterogeneous systems available. The challenge is to provide users with the ability to transparently access digital library contents in spite of the heterogeneity among the information sources.

Research communities have proposed several approaches to accomplish the system integration in digital libraries. In this thesis, the working of currently employed approaches …


An Interactive Learning Environment For A Dynamic Educational Digital Library, Ee Peng Lim, Dion Hoe-Lian Goh, Yin-Leng Theng, Eng-Kai Suen Jul 2004

An Interactive Learning Environment For A Dynamic Educational Digital Library, Ee Peng Lim, Dion Hoe-Lian Goh, Yin-Leng Theng, Eng-Kai Suen

Research Collection School Of Computing and Information Systems

GeogDL is a digital library of geography examination resources designed to assist students in preparing for a national geography examination in Singapore. We describe an interactive learning environment built into GeogDL that consists of four major components. The practice and review module allows students to attempt individual examination questions, the mock exam provides a simulation of the actual geography examination, the trends analysis tool provides an overview of the types of questions asked in previous examinations, while the contributions module allows students and teachers to create and share knowledge within the digital library.


Event Based Retrieval From Digital Libraries Containing Data Streams, Mohamed Hamed Kholief Jul 2003

Event Based Retrieval From Digital Libraries Containing Data Streams, Mohamed Hamed Kholief

Computer Science Theses & Dissertations

The objective of this research is to study the issues involved in building a digital library that contains data streams and allows event-based retrieval. “Digital Libraries are storehouses of information available through the Internet that provide ways to collect, store, and organize data and make it accessible for search, retrieval, and processing” [29]. Data streams are sources of information for applications such as news-on-demand, weather services, and scientific research, to name a few. A data stream is a sequence of data units produced over a period of time. Examples of data streams are video streams, audio stream, and sensor readings. …


Genescene: Biomedical Text And Data Mining, Gondy Leroy, Hsinchun Chen, Jesse D. Martinez, Shauna Eggers, Ryan R. Falsey, Kerri L. Kislin, Zan Huang, Jiexun Li, Jie Xu, Daniel M. Mcdonald, Gavin Ng May 2003

Genescene: Biomedical Text And Data Mining, Gondy Leroy, Hsinchun Chen, Jesse D. Martinez, Shauna Eggers, Ryan R. Falsey, Kerri L. Kislin, Zan Huang, Jiexun Li, Jie Xu, Daniel M. Mcdonald, Gavin Ng

CGU Faculty Publications and Research

To access the content of digital texts efficiently, it is necessary to provide more sophisticated access than keyword based searching. GeneScene provides biomedical researchers with research findings and background relations automatically extracted from text and experimental data. These provide a more detailed overview of the information available. The extracted relations were evaluated by qualified researchers and are precise. A qualitative ongoing evaluation of the current online interface indicates that this method to search the literature is more useful and efficient than keyword based searching.


Resource Annotation Framework In A Georeferenced And Geospatial Digital Library, Zehua Liu, Ee Peng Lim, Dion Hoe-Lian Goh Dec 2002

Resource Annotation Framework In A Georeferenced And Geospatial Digital Library, Zehua Liu, Ee Peng Lim, Dion Hoe-Lian Goh

Research Collection School Of Computing and Information Systems

G-Portal is a georeferenced and geospatial digital library that aims to identify, classify and organize geospatial and georeferenced resources on the web and to provide digital library services for these resources. Annotation service is supported in G-Portal to enable users to contribute content to the digital library. In this paper, we present a resource annotation framework for georeferenced and geospatial digital libraries and discuss its application in G-Portal. The framework is fiexible for managing annotations of heterogeneous web resources. It allows users to contribute not only the annotation content but also the schema of the annotations. Meanwhile, other digital library …


G-Portal : A Map-Based Digital Library For Distributed Geospatial And Georeferenced Resources, Ee Peng Lim, Dion Hoe-Lian Goh Jul 2002

G-Portal : A Map-Based Digital Library For Distributed Geospatial And Georeferenced Resources, Ee Peng Lim, Dion Hoe-Lian Goh

Research Collection School Of Computing and Information Systems

As the World Wide Web evolves into an immense information network, it is tempting to build new digital library services and expand existing digital library services to make use of web content. In this paper, we present the design and implementation of G-Portal, a web portal that aims to provide digital library services over geospatial and georeferenced content found on the World Wide Web. G-Portal adopts a map-based user interface to visualize and manipulate the distributed geospatial and georeferenced content. Annotation capabilities are supported, allowing users to contribute geospatial and georeferenced objects as well as their associated metadata. The other …


Federating Heterogeneous Digital Libraries By Metadata Harvesting, Xiaoming Liu Jan 2002

Federating Heterogeneous Digital Libraries By Metadata Harvesting, Xiaoming Liu

Computer Science Theses & Dissertations

This dissertation studies the challenges and issues faced in federating heterogeneous digital libraries (DLs) by metadata harvesting. The objective of federation is to provide high-level services (e.g. transparent search across all DLs) on the collective metadata from different digital libraries. There are two main approaches to federate DLs: distributed searching approach and harvesting approach. As the distributed searching approach replies on executing queries to digital libraries in real time, it has problems with scalability. The difficulty of creating a distributed searching service for a large federation is the motivation behind Open Archives Initiatives Protocols for Metadata Harvesting (OAI-PMH). OAI-PMH supports …


Enhancing A Virtual Distributed Library User Interface Via Server-Side User Profile Caching, Jason T. Ward Mar 2000

Enhancing A Virtual Distributed Library User Interface Via Server-Side User Profile Caching, Jason T. Ward

Theses and Dissertations

Various Department of Defense (DoD) agencies archive terabytes of intelligence imagery and electrooptical signature data. The Air Force Research Laboratory, Sensors Directorate (AFRL/SN), is tasked with creating and managing a virtual distributed library that facilitates secure, detailed queries across these distributed holdings using the internally developed Advanced Query Tool (AQT). In this research, a methodology is proposed to utilize user profiling techniques to augment a digital library. As part of this methodology, product-oriented usability analysis metrics are introduced that quantitatively verify the usability of an interface. The methodology is applied to the AFRL/SN's Virtual Distributed Laboratory AQT and subsequently analyzed …


The Ups Prototype: An Experimental End-User Service Across E-Print Archives, Herbert Van De Sompel, Thomas Krichel, Michael L. Nelson, Patrick Hochstenbach, Victor Lyapunov, Kurt Maly, Mohammad Zubair, Mohamed Kholief, Xiaoming Liu, Heath O'Connell Jan 2000

The Ups Prototype: An Experimental End-User Service Across E-Print Archives, Herbert Van De Sompel, Thomas Krichel, Michael L. Nelson, Patrick Hochstenbach, Victor Lyapunov, Kurt Maly, Mohammad Zubair, Mohamed Kholief, Xiaoming Liu, Heath O'Connell

Computer Science Faculty Publications

A meeting was held in Santa Fe, New Mexico, October 21-22, 1999, to generate discussion and consensus about interoperability of publicly available scholarly information archives. The invitees represented several well known e-print and report archive initiatives, as well as organizations with interests in digital libraries and the transformation of scholarly communication. The central goal of the meeting was to agree on recommendations that would make the creation of end-user services -- such as scientific search engines and linking systems -- for data originating from distributed and dissimilar archives easier. The Universal Preprint Service (UPS) Prototype was developed in preparation for …


Distributed Query Processing For Structured And Bibliographic Databases, Ee Peng Lim, Ying Lu Apr 1997

Distributed Query Processing For Structured And Bibliographic Databases, Ee Peng Lim, Ying Lu

Research Collection School Of Computing and Information Systems

To support future digital library systems which draw information from different sources on the internet, we have to provide integrated queries to pre-existing database servers which contain structured, semi-structured and unstructured data. In this paper, we specifically examine the problem of querying both existing structured relational databases and bibliographic databases. By adopting the well-accepted Z39.50 standard protocol to access bibliographic databases in different legacy library systems, we have developed an extended SQL model, known as HarpSQL, to support integrated queries to both SQL databases and bibliographic databases. Using HarpSQL, one can not only query bibliographic databases in an SQL manner, …