Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Databases and Information Systems

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel Jan 2022

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel

Computer Science Faculty Publications

The web began with a vision of, as stated by Tim Berners-Lee in 1991, “that much academic information should be freely available to anyone”. For many years, the development of the web and the development of digital libraries and other scholarly communications infrastructure proceeded in tandem. A milestone occurred in July, 1995, when the first issue of D-Lib Magazine was published as an online, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as …


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


Creating A Reproducible Metadata Transformation Pipeline Using Technology Best Practices, Cara Key, Mike Waugh Apr 2018

Creating A Reproducible Metadata Transformation Pipeline Using Technology Best Practices, Cara Key, Mike Waugh

Digital Initiatives Symposium

Over the course of two years, a team of librarians and programmers from LSU Libraries migrated the 186 collections of the Louisiana Digital Library from OCLC's CONTENTdm platform over to the open-source Islandora platform.

Early in the process, the team understood the value of creating a reproducible metadata transformation pipeline, because there were so many unknowns at the beginning of the process along with the certainty that mistakes would be made. This presentation will describe how the team used innovative and collaborative tools, such as Trello, Ansible, Vagrant, VirtualBox, git and GitHub to accomplish the task.


Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle Jan 2018

Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preserving and replaying modern web pages in high-fidelity has become an increasingly difficult task due to the increased usage of JavaScript. Reliance on server-side rewriting alone results in live-leakage and or the inability to replay a page due to the preserved JavaScript performing an action not permissible from the archive. The current state-of-the-art high fidelity archival preservation and replay solutions rely on handcrafted client-side URL rewriting libraries specifically tailored for the archive, namely Webrecoder's and Pywb's wombat.js [12]. Web archives not utilizing client-side rewriting rely on server-side rewriting that misses URLs used in a manner not accounted for …


Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2018

Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive's Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be …


Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2017

Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First paragraph] A Composite Memento is an archived representation of a web page with all the page requisites such as images and stylesheets. All embedded resources have their own URIs, hence, they are archived independently. For a meaningful archival replay, it is important to load all the page requisites from the archive within the temporal neighborhood of the base HTML page. To achieve this goal, archival replay systems try to rewrite all the resource references to appropriate archived versions before serving HTML, CSS, or JS. However, an effective server-side URL rewriting is difficult when URLs are generated dynamically using JavaScript. …


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


Linked Data Demystified: Practical Efforts To Transform Contentdm Metadata For The Linked Data Cloud, Silvia B. Southwick, Cory K. Lampert Nov 2012

Linked Data Demystified: Practical Efforts To Transform Contentdm Metadata For The Linked Data Cloud, Silvia B. Southwick, Cory K. Lampert

Library Faculty Presentations

The library literature and events like the ALA Annual Conference have been inundated with presentations and articles on linked data. At UNLV Libraries, we understand the importance of linked data in helping to better service our users. We have designed and initiated a pilot project to apply linked data concepts to the practical task of transforming a sample set of our CONTENTdm digital collections data into future-oriented linked data. This presentation will outline rationale for beginning work in linked data and detail the phases we will undertake in the proof of concept project. We hope through this research experiment to …


An Interactive Learning Environment For A Dynamic Educational Digital Library, Ee Peng Lim, Dion Hoe-Lian Goh, Yin-Leng Theng, Eng-Kai Suen Jul 2004

An Interactive Learning Environment For A Dynamic Educational Digital Library, Ee Peng Lim, Dion Hoe-Lian Goh, Yin-Leng Theng, Eng-Kai Suen

Research Collection School Of Computing and Information Systems

GeogDL is a digital library of geography examination resources designed to assist students in preparing for a national geography examination in Singapore. We describe an interactive learning environment built into GeogDL that consists of four major components. The practice and review module allows students to attempt individual examination questions, the mock exam provides a simulation of the actual geography examination, the trends analysis tool provides an overview of the types of questions asked in previous examinations, while the contributions module allows students and teachers to create and share knowledge within the digital library.


G-Portal : A Map-Based Digital Library For Distributed Geospatial And Georeferenced Resources, Ee Peng Lim, Dion Hoe-Lian Goh Jul 2002

G-Portal : A Map-Based Digital Library For Distributed Geospatial And Georeferenced Resources, Ee Peng Lim, Dion Hoe-Lian Goh

Research Collection School Of Computing and Information Systems

As the World Wide Web evolves into an immense information network, it is tempting to build new digital library services and expand existing digital library services to make use of web content. In this paper, we present the design and implementation of G-Portal, a web portal that aims to provide digital library services over geospatial and georeferenced content found on the World Wide Web. G-Portal adopts a map-based user interface to visualize and manipulate the distributed geospatial and georeferenced content. Annotation capabilities are supported, allowing users to contribute geospatial and georeferenced objects as well as their associated metadata. The other …