Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Old Dominion University

Computer Sciences

Computer Science Faculty Publications

Web archives

Articles 1 - 10 of 10

Full-Text Articles in Physical Sciences and Mathematics

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson Jan 2022

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configured to revisit the same original resource multiple times. This is incredibly useful for understanding an unfolding news story or the evolution of an organization. Unfortunately, over time, some of these original resources can go off-topic and no longer suit the purpose for which the collection was originally created. They can go off-topic due to web site …


Shari- An Integration Of Tools To Visualize The Story Of The Day, Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson Jan 2020

Shari- An Integration Of Tools To Visualize The Story Of The Day, Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Tools such as google news and flipboard exist to convey daily news, but what about the news of the past? In this paper, we describe how to combine several existing tools and web archive holdings to convey the “biggest story” for a given date in the past. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by Story-Graph in web archives. Hypercane analyzes these URLs to identify the most common terms, entities, and highest quality images for social media storytelling. Raintale then takes the output of these tools to produce a …


Mementoembed And Raintale For Web Archive Storytelling, Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson Jan 2020

Mementoembed And Raintale For Web Archive Storytelling, Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display this sample to drive visitors to their collection? Search engines and social media platforms often represent web pages as cards consisting of text snippets, titles, and images. Web storytelling is a popular method for grouping these cards in order to summarize a topic. Unfortunately, social media platforms are not archive-aware and fail to consistently create a good experience for …


Mementomap: An Archive Profile Dissemination Framework, Sawood Alam, Michele C. Weigle, Michael L. Nelson Jun 2019

Mementomap: An Archive Profile Dissemination Framework, Sawood Alam, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

We introduce MementoMap, a framework to express and disseminate holdings of web archives (archive profiles) by themselves or third parties. The framework allows arbitrary, flexible, and dynamic levels of details in its entries that fit the needs of archives of different scales. This enables Memento aggregators to significantly reduce wasted traffic to web archives.


Impact Of Http Cookie Violations In Web Archives, Sawood Alam, Michele C. Weigle, Michael L. Nelson Jun 2019

Impact Of Http Cookie Violations In Web Archives, Sawood Alam, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.


Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson Jan 2019

Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson

Computer Science Faculty Publications

[Summary] The authenticity, integrity, and provenance of resources we encounter on the web are increasingly in question. While many people are inured to the possibility of altered images, the easy accessibility of powerful software tools that synthesize audio and video will unleash a torrent of convincing “deepfakes” into our social discourse. Archives will no longer be monopolized by a countable number of institutions such as governments and publishers, but will become a competitive space filled with social engineers, propagandists, conspiracy theorists, and aspiring Hollywood directors. While the historical record has never been singular nor unmalleable, current technologies empower an unprecedented …


It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle Jan 2018

It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[Introduction] Checking fixity in web archives is performed to ensure archived resources, or mementos (denoted by URI-M) have remained unaltered since when they were captured. The final report of the PREMIS Working Group [2] defines information used for fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." The common technique for checking fixity is to generate a current hash value (i.e., a message digest or a checksum) for a file using a cryptographic hash function (e.g., SHA-256) and compare it to the hash value generated originally. If they have different hash …


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


A Method For Identifying Personalized Representations In Web Archives, Mat Kelly, Justin F. Brunelle, Michele C. Weigle, Michael L. Nelson Jan 2013

A Method For Identifying Personalized Representations In Web Archives, Mat Kelly, Justin F. Brunelle, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web resources are becoming increasingly personalized — two different users clicking on the same link at the same time can see content customized for each individual user. These changes result in multiple representations of a resource that cannot be canonicalized in Web archives. We identify characteristics of this problem by presenting a potential solution to generalize personalized representations in archives. We also present our proof-of-concept prototype that analyzes WARC (Web ARChive) format files, inserts metadata establishing relationships, and provides archive users the ability to navigate on the additional dimension of environment variables in a modified Wayback Machine.


Lessons Learned With Arc, An Oai-Pmh Service Provider, Xiaoming Liu, Kurt Maly, Michael L. Nelson Jan 2005

Lessons Learned With Arc, An Oai-Pmh Service Provider, Xiaoming Liu, Kurt Maly, Michael L. Nelson

Computer Science Faculty Publications

Web-based digital libraries have historically been built in isolation utilizing different technologies, protocols, and metadata. These differences hindered the development of digital library services that enable users to discover information from multiple libraries through a single unified interface. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a major, international effort to address technical interoperability among distributed repositories. Arc debuted in 2000 as the first end-user OAI-PMH service provider. Since that time, Arc has grown to include nearly 7,000,000 metadata records. Arc has been deployed in a number of environments and has served as the basis for many other …