Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 19 of 19

Full-Text Articles in Library and Information Science

Fair Signposting Profile, Herbert Van De Sompel, Martin Klein, Shawn Jones, Michael L. Nelson, Simeon Warner, Anusuriya Devaraju, Robert Huber, Wilko Steinhoff, Vyacheslav Tykhonov, Luc Boruta, Enno Meijers, Stian Soiland-Reyes, Mark Wilkonson May 2023

Fair Signposting Profile, Herbert Van De Sompel, Martin Klein, Shawn Jones, Michael L. Nelson, Simeon Warner, Anusuriya Devaraju, Robert Huber, Wilko Steinhoff, Vyacheslav Tykhonov, Luc Boruta, Enno Meijers, Stian Soiland-Reyes, Mark Wilkonson

Computer Science Faculty Publications

[First paragraph] This page details concrete recipes that platforms that host research outputs (e.g. data repositories, institutional repositories, publisher platforms, etc.) can follow to implement Signposting, a lightweight yet powerful approach to increase the FAIRness of scholarly objects.


Progenitor Cell Isolation From Mouse Epididymal Adipose Tissue And Sequencing Library Construction, Qianglin Liu, Chaoyang Li, Yuxia Li, Leshan Wang, Xujia Zhang, Buhao Deng, Peidong Gao, Mohammad Shiri, Fozi Alkaifi, Junxing Zhao, Jacqueline M. Stephens, Constantine A. Simintiras, Joseph Francis, Jiangwen Sun, Xing Fu Jan 2023

Progenitor Cell Isolation From Mouse Epididymal Adipose Tissue And Sequencing Library Construction, Qianglin Liu, Chaoyang Li, Yuxia Li, Leshan Wang, Xujia Zhang, Buhao Deng, Peidong Gao, Mohammad Shiri, Fozi Alkaifi, Junxing Zhao, Jacqueline M. Stephens, Constantine A. Simintiras, Joseph Francis, Jiangwen Sun, Xing Fu

Computer Science Faculty Publications

Here, we present a protocol to isolate progenitor cells from mouse epididymal visceral adipose tissue and construct bulk RNA and assay for transposase-accessible chromatin with sequencing (ATAC-seq) libraries. We describe steps for adipose tissue collection, cell isolation, and cell staining and sorting. We then detail procedures for both ATAC-seq and RNA sequencing library construction. This protocol can also be applied to other tissues and cell types directly or with minor modifications.

For complete details on the use and execution of this protocol, please refer to Liu et al. (2023).1

*1 Liu, Q., Li, C., Deng, B., Gao, P., …


Streaminghub: Interactive Stream Analysis Workflows, Yasith Jayawardana, Vikas G. Ashok, Sampath Jayarathna Jan 2022

Streaminghub: Interactive Stream Analysis Workflows, Yasith Jayawardana, Vikas G. Ashok, Sampath Jayarathna

Computer Science Faculty Publications

Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to transmit informative metadata alongside data may allow such workflows to intelligently consume data, propagate metadata to downstream tasks, and thereby auto-generate reusable, reproducible analytic outputs with zero supervision. Moreover, a visual programming interface to design, develop, and execute such workflows may allow rapid prototyping for interdisciplinary research. Capitalizing on these ideas, we propose StreamingHub, a framework to build metadata propagating, interactive stream analysis workflows using visual programming. We conduct …


D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel Jan 2022

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel

Computer Science Faculty Publications

The web began with a vision of, as stated by Tim Berners-Lee in 1991, “that much academic information should be freely available to anyone”. For many years, the development of the web and the development of digital libraries and other scholarly communications infrastructure proceeded in tandem. A milestone occurred in July, 1995, when the first issue of D-Lib Magazine was published as an online, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as …


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala Aug 2020

Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala

Computer Science Theses & Dissertations

In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. …


Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu Jan 2020

Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu

Computer Science Faculty Publications

Virginia Tech University Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, request $505,214 in grant funding for a 3-year project, the goal of which is to bring computational access to book-length documents, demonstrating that with Electronic Theses and Dissertations (ETDs). The project is motivated by the following library and community needs. (1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents. (2) Nationwide open access services for ETDs generally function at the metadata level. …


Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson Jan 2019

Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson

Computer Science Faculty Publications

[Summary] The authenticity, integrity, and provenance of resources we encounter on the web are increasingly in question. While many people are inured to the possibility of altered images, the easy accessibility of powerful software tools that synthesize audio and video will unleash a torrent of convincing “deepfakes” into our social discourse. Archives will no longer be monopolized by a countable number of institutions such as governments and publishers, but will become a competitive space filled with social engineers, propagandists, conspiracy theorists, and aspiring Hollywood directors. While the historical record has never been singular nor unmalleable, current technologies empower an unprecedented …


205.3 The Many Shapes Of Archive-It, Shawn Jones, Michael L. Nelson, Alexander Nwala, Michele C. Weigle Jan 2018

205.3 The Many Shapes Of Archive-It, Shawn Jones, Michael L. Nelson, Alexander Nwala, Michele C. Weigle

Computer Science Faculty Publications

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government organizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources, or seeds, and creating their own web archive collections. We focus on the collections within Archive-It, a subscription service started by the Internet Archive in 2005 for the purpose of allowing organizations to create their own collections of archived web pages, or mementos. Understanding these collections could be done via their user-supplied metadata or via text analysis, but the metadata is …


A Survey Of Archival Replay Banners, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2018

A Survey Of Archival Replay Banners, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

We surveyed various archival systems to compare and contrast different techniques used to implement an archival replay banner. We found that inline plain HTML injection is the most common approach, but prone to style conflicts. Iframe-based banners are also very common and while they do not have style conflicts, they suffer from screen real estate wastage and limited design choices. Custom Elements-based banners are promising, but due to being a new web standard, these are not yet widely deployed.


Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle Jan 2018

Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preserving and replaying modern web pages in high-fidelity has become an increasingly difficult task due to the increased usage of JavaScript. Reliance on server-side rewriting alone results in live-leakage and or the inability to replay a page due to the preserved JavaScript performing an action not permissible from the archive. The current state-of-the-art high fidelity archival preservation and replay solutions rely on handcrafted client-side URL rewriting libraries specifically tailored for the archive, namely Webrecoder's and Pywb's wombat.js [12]. Web archives not utilizing client-side rewriting rely on server-side rewriting that misses URLs used in a manner not accounted for …


It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle Jan 2018

It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[Introduction] Checking fixity in web archives is performed to ensure archived resources, or mementos (denoted by URI-M) have remained unaltered since when they were captured. The final report of the PREMIS Working Group [2] defines information used for fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." The common technique for checking fixity is to generate a current hash value (i.e., a message digest or a checksum) for a file using a cryptographic hash function (e.g., SHA-256) and compare it to the hash value generated originally. If they have different hash …


Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2018

Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive's Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be …


Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2017

Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First paragraph] A Composite Memento is an archived representation of a web page with all the page requisites such as images and stylesheets. All embedded resources have their own URIs, hence, they are archived independently. For a meaningful archival replay, it is important to load all the page requisites from the archive within the temporal neighborhood of the base HTML page. To achieve this goal, archival replay systems try to rewrite all the resource references to appropriate archived versions before serving HTML, CSS, or JS. However, an effective server-side URL rewriting is difficult when URLs are generated dynamically using JavaScript. …


Profiling Web Archives For Efficient Memento Query Routing, Sawood Alam, Michael L. Nelson, Herbert Van De Sompel, Lyudmila L. Balakireva, Harihar Shankar, David S. H. Rosenthal Jan 2015

Profiling Web Archives For Efficient Memento Query Routing, Sawood Alam, Michael L. Nelson, Herbert Van De Sompel, Lyudmila L. Balakireva, Harihar Shankar, David S. H. Rosenthal

Computer Science Faculty Publications

No abstract provided.


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


Warcreate And Wail: Warc, Wayback, And Heritrix Made Easy, Mat Kelly, Michael L. Nelson, Michele C. Weigle Jan 2013

Warcreate And Wail: Warc, Wayback, And Heritrix Made Easy, Mat Kelly, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First slide]

The Problem

Institutional Tools, Personal Archivists

  • ON YOUR MACHINE

-Complex to Operate

-Require Infrastructure

  • DELEGATED TO INSTITUTIONS

-$$$

-Lose original perspective

  • Locale content tailoring (DC vs. San Francisco)
  • Observation Medium (PC web browser vs. Crawler)


Warcreate - Create Wayback-Consumable Warc Files From Any Webpage, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2012

Warcreate - Create Wayback-Consumable Warc Files From Any Webpage, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First Slide]

What is WARCreate?

  • Google Chrome extension
  • Creates WARC files
  • Enables preservation by users from their browser
  • First steps in bringing Institutional Archiving facilities to the PC


Object Reuse And Exchange, Michael L. Nelson, Carl Lagoze, Herbert Van De Sompel, Pete Johnston, Robert Sanderson, Simeon Warner, Jürgen Sieck (Ed.), Michael A. Herzog (Ed.) Jan 2009

Object Reuse And Exchange, Michael L. Nelson, Carl Lagoze, Herbert Van De Sompel, Pete Johnston, Robert Sanderson, Simeon Warner, Jürgen Sieck (Ed.), Michael A. Herzog (Ed.)

Computer Science Faculty Publications

The Open Archives Object Reuse and Exchange (OAI-ORE) project defines standards for the description and exchange of aggregations of Web resources. The OAI-ORE abstract data model is conformant with the Architecture of the World Wide Web and leverages concepts from the Semantic Web, including RDF descriptions and Linked Data. In this paper we provide a brief review of a motivating example and its serialization in Atom.