Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 25 of 25

Full-Text Articles in Library and Information Science

Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti May 2023

Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti

Computer Science Theses & Dissertations

Social media has become one of the primary modes of communication in recent times, with popular platforms such as Facebook, Twitter, and Instagram leading the way. Despite its popularity, Instagram has not received as much attention in academic research compared to Facebook and Twitter, and its significant role in contemporary society is often overlooked. Web archives are making efforts to preserve social media content despite the challenges posed by the dynamic nature of these sites. The goal of our research is to facilitate the easy discovery of archived copies, or mementos, of all posts belonging to a specific Instagram account …


D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel Jan 2022

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel

Computer Science Faculty Publications

The web began with a vision of, as stated by Tim Berners-Lee in 1991, “that much academic information should be freely available to anyone”. For many years, the development of the web and the development of digital libraries and other scholarly communications infrastructure proceeded in tandem. A milestone occurred in July, 1995, when the first issue of D-Lib Magazine was published as an online, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as …


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


A Vertical Cooperation Model To Manage Digital Collections And Institutional Resources, Jack M. Maness, Kim Pham, Fernando Reyes, Jeff Rynhart Apr 2020

A Vertical Cooperation Model To Manage Digital Collections And Institutional Resources, Jack M. Maness, Kim Pham, Fernando Reyes, Jeff Rynhart

University Libraries: Faculty Scholarship

The technology space of the University of Denver Libraries to manage digital collections and institutional resources isn’t relegated to one department on campus – rather, it distributed across a network of collaborators with the skills and expertise to provide that support. The infrastructure, which is comprised of an archival metadata management system (Archivespace), a digital repository (Node.js + ElasticSearch), preservation storage (ArchivesDirect), and a streaming server (Kaltura) is independently but cooperatively managed across IT, library departments and vendors. The coordinated eort of digital curation activities still allows each group to focus on the service they have the most vested interest …


Digital Libraries, Intelligent Data Analytics, And Augmented Description: A Demonstration Project, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Jan 2020

Digital Libraries, Intelligent Data Analytics, And Augmented Description: A Demonstration Project, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

UNL Libraries: Faculty Publications

From July 16-to November 8, 2019, the Aida digital libraries research team at the University of Nebraska-Lincoln collaborated with the Library of Congress on “Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project.“ This demonstration project sought to (1) develop and investigate the viability and feasibility of textual and image-based data analytics approaches to support and facilitate discovery; (2) understand technical tools and requirements for the Library of Congress to improve access and discovery of its digital collections; and (3) enable the Library of Congress to plan for future possibilities. In pursuit of these goals, we focused our …


Final Presentation To The Library Of Congress On Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Jan 2020

Final Presentation To The Library Of Congress On Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

University of Nebraska-Lincoln Libraries: Conference Presentations and Speeches

This presentation to Library of Congress staff, delivered onsite on January 10, 2020, presents a tour through the demonstration project pursued by the Aida digital libraries research team with the Library of Congress in 2019-2020. In addition to providing an overview and analysis of the specific machine learning projects scoped and explored, this presentation includes a number of high-level take-aways and recommendations designed to influence and inform the Library of Congress's machine learning efforts going forward.


A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox Jan 2020

A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox

Computer Science Faculty Publications

Extracting metadata from scholarly papers is an important text mining problem. Widely used open-source tools such as GROBID are designed for born-digital scholarly papers but often fail for scanned documents, such as Electronic Theses and Dissertations (ETDs). Here we present a preliminary baseline work with a heuristic model to extract metadata from the cover pages of scanned ETDs. The process started with converting scanned pages into images and then text files by applying OCR tools. Then a series of carefully designed regular expressions for each field is applied, capturing patterns for seven metadata fields: titles, authors, years, degrees, academic programs, …


Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Nov 2019

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

CSE Conference and Workshop Papers

Includes framing, overview, and discussion of the explorations pursued as part of the Digital Libraries, Intelligent Data Analytics, and Augmented Description demonstration project, pursued by members of the Aida digital libraries research team at the University of Nebraska-Lincoln through a research services contract with the Library of Congress. This presentation covered: Aida research team and background for the demonstration project; broad outlines of “Digital Libraries, Intelligent Data Analytics, and Augmented Description”; what changed for us as a research team over the collaboration and why; deliverables of our work; thoughts toward “What next”; and deep-dives into the explorations. The machine learning …


Document Images And Machine Learning: A Collaboratory Between The Library Of Congress And The Image Analysis For Archival Discovery (Aida) Lab At The University Of Nebraska, Lincoln, Ne, Yi Liu, Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang Aug 2019

Document Images And Machine Learning: A Collaboratory Between The Library Of Congress And The Image Analysis For Archival Discovery (Aida) Lab At The University Of Nebraska, Lincoln, Ne, Yi Liu, Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang

CSE Conference and Workshop Papers

This presentation summarized and presented preliminary results from the first weeks of work conducted by the Aida research team in response to Library of Congress funding notice ID 030ADV19Q0274, “The Library of Congress – Pre-processing Pilot.” It includes overviews of projects on historic document segmentation, document classification, document quality assessment, figure and graph extraction from historic documents, text-line extraction from figures, subject and objective quality assesments, and digitization type differentiation.


Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang Jan 2019

Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang

CSE Technical Reports

This document includes work-in-progress reports submitted to the Library of Congress as part of the Aida digital libraries research team's work on Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project. These work-in-progress reports provide a snapshot glimpse, as well as underlying rationale and decision-making, at various points in the development of the project and its machine learning explorations. Reports cover explorations on historic newspapers, minimally-processed manuscript collections, materials digitized from physical originals and those digitized from microform surrogates, and investigate challenges related to image segmentation and document zoning, classification, document image quality analysis, metadata generation, and more.


Using Chronicling America’S Images To Explore Digitized Historic Newspapers & Imagine Alternative Futures, Elizabeth Lorang, Leen-Kiat Soh Sep 2018

Using Chronicling America’S Images To Explore Digitized Historic Newspapers & Imagine Alternative Futures, Elizabeth Lorang, Leen-Kiat Soh

University of Nebraska-Lincoln Libraries: Conference Presentations and Speeches

This presentation situates the work of the Aida team broadly as well as hinges this work on some very specific challenges for digital libraries. In doing so demonstrate the many types of questions and domains to be explored in digitized newspapers.


Creating A Reproducible Metadata Transformation Pipeline Using Technology Best Practices, Cara Key, Mike Waugh Apr 2018

Creating A Reproducible Metadata Transformation Pipeline Using Technology Best Practices, Cara Key, Mike Waugh

Digital Initiatives Symposium

Over the course of two years, a team of librarians and programmers from LSU Libraries migrated the 186 collections of the Louisiana Digital Library from OCLC's CONTENTdm platform over to the open-source Islandora platform.

Early in the process, the team understood the value of creating a reproducible metadata transformation pipeline, because there were so many unknowns at the beginning of the process along with the certainty that mistakes would be made. This presentation will describe how the team used innovative and collaborative tools, such as Trello, Ansible, Vagrant, VirtualBox, git and GitHub to accomplish the task.


Increasing Our Vision For 21st-Century Digital Libraries, Elizabeth M. Lorang, Leen-Kiat Soh Jan 2018

Increasing Our Vision For 21st-Century Digital Libraries, Elizabeth M. Lorang, Leen-Kiat Soh

University of Nebraska-Lincoln Libraries: Conference Presentations and Speeches

This presentation

  1. Reads digital library interfaces—or their "main door" interfaces—as glimpses into what we have thus far valued in the development of digital libraries
  2. Frames a visual way of thinking about textual materials
  3. Introduces the work of our research team—where we are now, and where we're headed
  4. Draws some connections between the parts

This presentation is very much a look into thinking in process and work in progress and proposes the following ideas:

  1. As a community, we can do much more with the digital images we're creating of textual materials than we've heretofore done.
  2. We aspire to have additional layers …


Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2018

Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive's Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be …


Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle Jan 2018

Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preserving and replaying modern web pages in high-fidelity has become an increasingly difficult task due to the increased usage of JavaScript. Reliance on server-side rewriting alone results in live-leakage and or the inability to replay a page due to the preserved JavaScript performing an action not permissible from the archive. The current state-of-the-art high fidelity archival preservation and replay solutions rely on handcrafted client-side URL rewriting libraries specifically tailored for the archive, namely Webrecoder's and Pywb's wombat.js [12]. Web archives not utilizing client-side rewriting rely on server-side rewriting that misses URLs used in a manner not accounted for …


Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2017

Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First paragraph] A Composite Memento is an archived representation of a web page with all the page requisites such as images and stylesheets. All embedded resources have their own URIs, hence, they are archived independently. For a meaningful archival replay, it is important to load all the page requisites from the archive within the temporal neighborhood of the base HTML page. To achieve this goal, archival replay systems try to rewrite all the resource references to appropriate archived versions before serving HTML, CSS, or JS. However, an effective server-side URL rewriting is difficult when URLs are generated dynamically using JavaScript. …


Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle Apr 2016

Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle

Computer Science Theses & Dissertations

Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival …


Developing An Image-Based Classifier For Detecting Poetic Content In Historic Newspaper Collections, Elizabeth M. Lorang, Leen-Kiat Soh, Maanas Varma Datla, Spencer Kulwicki Mar 2015

Developing An Image-Based Classifier For Detecting Poetic Content In Historic Newspaper Collections, Elizabeth M. Lorang, Leen-Kiat Soh, Maanas Varma Datla, Spencer Kulwicki

UNL Libraries: Faculty Publications

"Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections" details and analyzes the first stage of work of the Image Analysis for Archival Discovery project team. Our team is is investigating the use of image analysis to identify poetic content in historic newspapers. The project seeks both to augment the study of literary history by drawing attention to the magnitude of poetry published in newspapers and by making the poetry more readily available for study, as well as to advance work on the use of digital images in facilitating discovery in digital libraries and other digitized collections. …


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


Linked Data Demystified: Practical Efforts To Transform Contentdm Metadata For The Linked Data Cloud, Silvia B. Southwick, Cory K. Lampert Nov 2012

Linked Data Demystified: Practical Efforts To Transform Contentdm Metadata For The Linked Data Cloud, Silvia B. Southwick, Cory K. Lampert

Library Faculty Presentations

The library literature and events like the ALA Annual Conference have been inundated with presentations and articles on linked data. At UNLV Libraries, we understand the importance of linked data in helping to better service our users. We have designed and initiated a pilot project to apply linked data concepts to the practical task of transforming a sample set of our CONTENTdm digital collections data into future-oriented linked data. This presentation will outline rationale for beginning work in linked data and detail the phases we will undertake in the proof of concept project. We hope through this research experiment to …


The Data Conservancy: Science-Driven Information Science, Christine L. Borgman, Carole L. Palmer Jun 2010

The Data Conservancy: Science-Driven Information Science, Christine L. Borgman, Carole L. Palmer

Christine L. Borgman

The Data Conservancy –which is a National Science Foundation funded Datanet project with a diverse array of partners – embraces a shared vision: data curation is not an end, but rather a means to collect, organize, validate, and preserve data to address grand research challenges that face society. Key to the data conservancy approach is information science research on the data practices of the science domains. Three teams are conducting social studies of individual science domains. Prof. Carole Palmer of the University of Illinois will report on their comparative studies of multiple biosciences domains. Prof. Christine Borgman of the University …


Recommender Systems For Multimedia Libraries: An Evaluation Of Different Models For Datamining Usage Data, Raquel Oliveira Araujo Dec 2004

Recommender Systems For Multimedia Libraries: An Evaluation Of Different Models For Datamining Usage Data, Raquel Oliveira Araujo

Computer Science Theses & Dissertations

Many recommender systems exist today to help users deal with the large growth in the amount of information available in the Internet. Most of these recommender systems use collaborative filtering or content-based techniques to present new material that would be of interest to a user. While these methods have proven to be effective, they have not been designed specifically for multimedia collections. In this study we present a new method to find recommendations that is not dependent on traditional Information Retrieval (IR) methods and compare it to algorithms that do rely on traditional IR methods. We evaluated these algorithms using …


Buckets: Smart Objects For Digital Libraries, Michael L. Nelson Jul 2000

Buckets: Smart Objects For Digital Libraries, Michael L. Nelson

Computer Science Theses & Dissertations

Discussion of digital libraries (DLs) is often dominated by the merits of various archives, repositories, search engines, search interfaces and database systems. While these technologies are necessary for information management, information content and information retrieval systems should progress on independent paths and each should make limited assumptions about the status or capabilities of the other. Information content is more important than the systems used for its storage and retrieval. Digital information should have the same long-term survivability prospects as traditional hardcopy information and should not be impacted by evolving search engine technologies or vendor vagaries in database management systems.

Digital …


Architectural Optimization Of Digital Libraries, Aileen O. Biser Aug 1998

Architectural Optimization Of Digital Libraries, Aileen O. Biser

Computer Science Theses & Dissertations

This work investigates performance and scaling issues relevant to large scale distributed digital libraries. Presently, performance and scaling studies focus on specific implementations of production or prototype digital libraries. Although useful information is gained to aid these designers and other researchers with insights to performance and scaling issues, the broader issues relevant to very large scale distributed libraries are not addressed. Specifically, no current studies look at the extreme or worst case possibilities in digital library implementations. A survey of digital library research issues is presented. Scaling and performance issues are mentioned frequently in the digital library literature but are …


Building Multi-Discipline, Multi-Format Digital Libraries Using Clusters And Buckets, Michael L. Nelson Aug 1997

Building Multi-Discipline, Multi-Format Digital Libraries Using Clusters And Buckets, Michael L. Nelson

Computer Science Theses & Dissertations

Our objective was to study the feasibility of extending the Dienst protocol to enable a multi-discipline, multi-format digital library. We implemented two new technologies: cluster functionality and publishing buckets. We have designed a possible implementation of clusters and buckets, and have prototyped some aspects of the resultant digital library.

Currently, digital libraries are segregated by the disciplines they serve ( computer science, aeronautics, etc.), and by the format of their holdings (reports, software, datasets, etc.). NCSTRL+ is a multi-discipline, multi-format digital library (DL) prototype created to explore the feasibility of the design and implementation issues involved with created a …