Open Access. Powered by Scholars. Published by Universities.®

Archival Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics

Series

Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 35

Full-Text Articles in Archival Science

Robots Still Outnumber Humans In Web Archives In 2019, But Less Than In 2015 And 2012, Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2024

Robots Still Outnumber Humans In Web Archives In 2019, But Less Than In 2015 And 2012, Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

The significance of the web and the crucial role of web archives in its preservation highlight the necessity of understanding how users, both human and robot, access web archive content, and how best to satisfy this disparate needs of both types of users. To identify robots and humans in web archives and analyze their respective access patterns, we used the Internet Archive’s (IA) Wayback Machine access logs from 2012, 2015, and 2019, as well as Arquivo.pt’s (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based …


Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian) Mar 2023

Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)

Library Philosophy and Practice (e-journal)

Abstract

Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …


Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2023

Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the …


A Call For The Library Community To Deploy Best Practices Toward A Database For Biocultural Knowledge Relating To Climate Change, Martha B. Lerski Jan 2022

A Call For The Library Community To Deploy Best Practices Toward A Database For Biocultural Knowledge Relating To Climate Change, Martha B. Lerski

Publications and Research

Abstract

Purpose – In this paper, a call to the library and information science community to support documentation and conservation of cultural and biocultural heritage has been presented.

Design/methodology/approach – Based in existing Literature, this proposal is generative and descriptive— rather than prescriptive—regarding precisely how libraries should collaborate to employ technical and ethical best practices to provide access to vital data, research and cultural narratives relating to climate.

Findings – COVID-19 and climate destruction signal urgent global challenges. Library best practices are positioned to respond to climate change. Literature indicates how libraries preserve, share and cross-link cultural and scientific knowledge. …


Campus Mobile History Application, Drew Adan, Christine Sears Jan 2022

Campus Mobile History Application, Drew Adan, Christine Sears

Summer Community of Scholars (RCEU and HCR) Project Proposals

No abstract provided.


The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson Jan 2022

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configured to revisit the same original resource multiple times. This is incredibly useful for understanding an unfolding news story or the evolution of an organization. Unfortunately, over time, some of these original resources can go off-topic and no longer suit the purpose for which the collection was originally created. They can go off-topic due to web site …


Law Library Blog (January 2021): Legal Beagle's Blog Archive, Roger Williams University School Of Law Jan 2021

Law Library Blog (January 2021): Legal Beagle's Blog Archive, Roger Williams University School Of Law

Law Library Newsletters/Blog

No abstract provided.


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


The Complicated History Of Environmental Racism, Victoria Peña-Parr Aug 2020

The Complicated History Of Environmental Racism, Victoria Peña-Parr

Black History at UNM

University of New Mexico Honors College Assistant Professor, Myrriah Gómez, defines and explores environmental racism, specifically its effects in New Mexico.


Ai For Archives: Using Facial Recognition To Enhance Metadata, Rebecca Bakker, Kelley Rowan, Liting Hu, Boyuan Guan, Pinchao Liu, Zhongzhou Li, Ruizhe He, Christine Monge Jul 2020

Ai For Archives: Using Facial Recognition To Enhance Metadata, Rebecca Bakker, Kelley Rowan, Liting Hu, Boyuan Guan, Pinchao Liu, Zhongzhou Li, Ruizhe He, Christine Monge

Works of the FIU Libraries

The goal of this research project was to determine the most effective facial recognition applications that could be implemented into digital archive image collections from libraries, museums, and cultural heritage institutions. Computer scientists and librarians at Florida International University collaborated to conduct qualitative assessments of both face detection and face search using photographs from FIU’s digital collections. Specifically, the facial recognition platforms OpenCV, Face++, and Amazon AWS were analyzed. This project seeks to assist LYRASIS community members who wish to incorporate facial recognition and other artificial intelligence technology into their digital collections and repositories as a method to reduce research …


Scraping Bepress: Downloading Dissertations For Preservation, Stephen Zweibel Feb 2020

Scraping Bepress: Downloading Dissertations For Preservation, Stephen Zweibel

Copyright, Fair Use, Scholarly Communication, etc.

This article will describe our process developing a script to automate downloading of documents and secondary materials from our library’s BePress repository. Our objective was to collect the full archive of dissertations and associated files from our repository into a local disk for potential future applications and to build out a preservation system.

Unlike at some institutions, our students submit directly into BePress, so we did not have a separate repository of the files; and the backup of BePress content that we had access to was not in an ideal format (for example, it included “withdrawn” items and did not …


Implementing Facial Recognition Technology In A Municipal Archives Digitization Project, Rebecca Bakker Sep 2019

Implementing Facial Recognition Technology In A Municipal Archives Digitization Project, Rebecca Bakker

Works of the FIU Libraries

This poster at the 2019 annual meeting of the South Florida Archivists highlights a project where the facial recognition technology of Adobe Lightroom CC is used to identify individuals in photographs held by a local municipal archive. The photographs contain hundreds of images showing unnamed commissioners and city workers from the 1970s to the 1990s, with most of the images lacking metadata or information. Various strategies are employed to identify key city officials in the photographs, allowing their names to be added to the metadata of the records hosted in a digital repository. The poster demonstrates the potential and limitations …


Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson Jan 2019

Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson

Computer Science Faculty Publications

[Summary] The authenticity, integrity, and provenance of resources we encounter on the web are increasingly in question. While many people are inured to the possibility of altered images, the easy accessibility of powerful software tools that synthesize audio and video will unleash a torrent of convincing “deepfakes” into our social discourse. Archives will no longer be monopolized by a countable number of institutions such as governments and publishers, but will become a competitive space filled with social engineers, propagandists, conspiracy theorists, and aspiring Hollywood directors. While the historical record has never been singular nor unmalleable, current technologies empower an unprecedented …


Surveying Digital Collections Stewardship In Nebraska [Original Survey Form], Jennifer L. Thoegersen, Blake Graham Apr 2018

Surveying Digital Collections Stewardship In Nebraska [Original Survey Form], Jennifer L. Thoegersen, Blake Graham

University of Nebraska-Lincoln Data Repository

No abstract provided.


Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2018

Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive's Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be …


Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle Jan 2018

Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preserving and replaying modern web pages in high-fidelity has become an increasingly difficult task due to the increased usage of JavaScript. Reliance on server-side rewriting alone results in live-leakage and or the inability to replay a page due to the preserved JavaScript performing an action not permissible from the archive. The current state-of-the-art high fidelity archival preservation and replay solutions rely on handcrafted client-side URL rewriting libraries specifically tailored for the archive, namely Webrecoder's and Pywb's wombat.js [12]. Web archives not utilizing client-side rewriting rely on server-side rewriting that misses URLs used in a manner not accounted for …


It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle Jan 2018

It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[Introduction] Checking fixity in web archives is performed to ensure archived resources, or mementos (denoted by URI-M) have remained unaltered since when they were captured. The final report of the PREMIS Working Group [2] defines information used for fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." The common technique for checking fixity is to generate a current hash value (i.e., a message digest or a checksum) for a file using a cryptographic hash function (e.g., SHA-256) and compare it to the hash value generated originally. If they have different hash …


205.3 The Many Shapes Of Archive-It, Shawn Jones, Michael L. Nelson, Alexander Nwala, Michele C. Weigle Jan 2018

205.3 The Many Shapes Of Archive-It, Shawn Jones, Michael L. Nelson, Alexander Nwala, Michele C. Weigle

Computer Science Faculty Publications

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government organizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources, or seeds, and creating their own web archive collections. We focus on the collections within Archive-It, a subscription service started by the Internet Archive in 2005 for the purpose of allowing organizations to create their own collections of archived web pages, or mementos. Understanding these collections could be done via their user-supplied metadata or via text analysis, but the metadata is …


Infographics: A Practical Guide For Librarians, Darren Sweeper Feb 2017

Infographics: A Practical Guide For Librarians, Darren Sweeper

Sprague Library Scholarship and Creative Works

No abstract provided.


Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2017

Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First paragraph] A Composite Memento is an archived representation of a web page with all the page requisites such as images and stylesheets. All embedded resources have their own URIs, hence, they are archived independently. For a meaningful archival replay, it is important to load all the page requisites from the archive within the temporal neighborhood of the base HTML page. To achieve this goal, archival replay systems try to rewrite all the resource references to appropriate archived versions before serving HTML, CSS, or JS. However, an effective server-side URL rewriting is difficult when URLs are generated dynamically using JavaScript. …


Databrarianship: The Academic Data Librarian In Theory And Practice, Darren Sweeper Dec 2016

Databrarianship: The Academic Data Librarian In Theory And Practice, Darren Sweeper

Sprague Library Scholarship and Creative Works

No abstract provided.


Data Visualizations And Infographics, Darren Sweeper Sep 2016

Data Visualizations And Infographics, Darren Sweeper

Sprague Library Scholarship and Creative Works

No abstract provided.


Friends Of Musselman Library Newsletter Spring 2016, Musselman Library Apr 2016

Friends Of Musselman Library Newsletter Spring 2016, Musselman Library

Friends of Musselman Library Newsletter

From the Dean (Robin Wagner)

Library Receives 9/11 Commission Papers (Fred Fielding '16)

Library News

Digital Scholarship Fellows

From Paupers to Presidents

Fair Use Week

Reading About Race

Student Workers Save the Day (Nadia Romero Nardelli '19)

Life in the Fishbowl (Brittany Barry '17)

In Memory of Douglas R. Price; Former Aide to Eisenhower

Special Purchases

From the Piano Bench (Jay P. Brown ’51, Doug Brouder ’83, Julie Caterson ’84 and Mr. & Mrs. Michael Fiery)

Research Reflections: The Spirit of Gettysburg (Timothy Sestrick)

Gift of Art

Old Gettysburg Back to Thee (Jenna Fleming '16, Avery Fox '16, Melanie Fernandes …


Comparing Institutional Repository Software: Pampering Metadata Uploaders, Craighton Hippenhammer Apr 2016

Comparing Institutional Repository Software: Pampering Metadata Uploaders, Craighton Hippenhammer

Faculty Scholarship – Library Science

This article highlights the key concepts of institutional repositories and identifies the strengths of Digital Commons and Wesleyan Holiness Digital Library products. Special attention is given to software structures and features, support systems, and factors that impact quality. Parts of this article were given as an Association of Christian Librarians annual national conference workshop presentation presented at Carson-Newman University, Jefferson City, Tennessee, June 11, 2015.


Profiling Web Archives For Efficient Memento Query Routing, Sawood Alam, Michael L. Nelson, Herbert Van De Sompel, Lyudmila L. Balakireva, Harihar Shankar, David S. H. Rosenthal Jan 2015

Profiling Web Archives For Efficient Memento Query Routing, Sawood Alam, Michael L. Nelson, Herbert Van De Sompel, Lyudmila L. Balakireva, Harihar Shankar, David S. H. Rosenthal

Computer Science Faculty Publications

No abstract provided.


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


Warcreate And Wail: Warc, Wayback, And Heritrix Made Easy, Mat Kelly, Michael L. Nelson, Michele C. Weigle Jan 2013

Warcreate And Wail: Warc, Wayback, And Heritrix Made Easy, Mat Kelly, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First slide]

The Problem

Institutional Tools, Personal Archivists

  • ON YOUR MACHINE

-Complex to Operate

-Require Infrastructure

  • DELEGATED TO INSTITUTIONS

-$$$

-Lose original perspective

  • Locale content tailoring (DC vs. San Francisco)
  • Observation Medium (PC web browser vs. Crawler)


Reducing Barriers To Wesleyan Thought: Olivet Nazarene University And The Wesleyan Holiness Library, Craighton T. Hippenhammer Jan 2013

Reducing Barriers To Wesleyan Thought: Olivet Nazarene University And The Wesleyan Holiness Library, Craighton T. Hippenhammer

Faculty Scholarship – Library Science

Olivet Nazarene University’s recent move to start publishing academic scholarship in a digital institutional repository, Digital Commons, is a smart move to not only highlight and preserve Olivet scholarship, but also to support the worldwide open access movement that is widely expected to rescue the current failing model of academic publishing. The traditional methods for publishing faculty scholarship have been inadequate for some time, and the financial structures that sustain them are collapsing due to skyrocketing journal prices. What faculty members want most for their research is that it be as accessible, available and useful to other researchers and to …


Warcreate - Create Wayback-Consumable Warc Files From Any Webpage, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2012

Warcreate - Create Wayback-Consumable Warc Files From Any Webpage, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First Slide]

What is WARCreate?

  • Google Chrome extension
  • Creates WARC files
  • Enables preservation by users from their browser
  • First steps in bringing Institutional Archiving facilities to the PC


A Synthetic Document Image Dataset For Developing And Evaluating Historical Document Processing Methods, Daniel Walker, William Lund, Eric Ringger Jan 2012

A Synthetic Document Image Dataset For Developing And Evaluating Historical Document Processing Methods, Daniel Walker, William Lund, Eric Ringger

Faculty Publications

Document images accompanied by OCR output text and ground truth transcriptions are useful for developing and evaluating document recognition and processing methods, especially for historical document images. Additionally, research into improving the performance of such methods often requires further annotation of training and test data (e.g., topical document labels). However, transcribing and labeling historical documents is expensive. As a result, existing real-world document image datasets with such accompanying resources are rare and often relatively small. We introduce synthetic document image datasets of varying levels of noise that have been created from standard (English) text corpora using an existing document degradation …