Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Computer Engineering

Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2023

Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the …


Visualizing Digital Collections At Archive-It, Kalpesh Padia Jul 2012

Visualizing Digital Collections At Archive-It, Kalpesh Padia

Computer Science Theses & Dissertations

Archive-It, a subscription service from the Internet Archive, allows users to create,maintain, and view digital collections of web resources. The current interface of Archive-It is largely text-based, supporting drill-down navigation using lists of URIs.While this interface provides good searching capabilities, it is not efficient for browsing. In the absence of keywords, a user has to spend large amount of time trying to locate a web page of interest. In order to provide a better visual experience to the user, we have studied the underlying characteristics of Archive-It collections and implemented six different visualizations (treemap, time cloud, bubble chart, image plot, …


Creating Preservation-Ready Web Resources, Joan A. Smith, Michael L. Nelson Jan 2008

Creating Preservation-Ready Web Resources, Joan A. Smith, Michael L. Nelson

Computer Science Faculty Publications

There are innumerable departmental, community, and personal web sites worthy of long-term preservation but proportionally fewer archivists available to properly prepare and process such sites. We propose a simple model for such everyday web sites which takes advantage of the web server itself to help prepare the site's resources for preservation. This is accomplished by having metadata utilities analyze the resource at the time of dissemination. The web server responds to the archiving repository crawler by sending both the resource and the just-in-time generated metadata as a straight-forward XML-formatted response. We call this complex object (resource + metadata) a CRATE. …


Opal: In Vivo Based Preservation Framework For Locating Lost Web Pages, Terry L. Harrison Jul 2005

Opal: In Vivo Based Preservation Framework For Locating Lost Web Pages, Terry L. Harrison

Computer Science Theses & Dissertations

We present Opal, a framework for interactively locating missing web pages (http status code 404). Opal is an example of "in vivo" preservation: harnessing the collective behavior of web archives, commercial search engines, and research projects for the purpose of preservation. Opal servers learn from their experiences and are able to share their knowledge with other Opal servers using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Using cached copies that can be found on the web, Opal creates lexical signatures which are then used to search for similar versions of the web page. Using the OAI-PMH to facilitate …