Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Publication
- Publication Type
Articles 1 - 5 of 5
Full-Text Articles in Physical Sciences and Mathematics
Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown
Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown
Computer Science Theses & Dissertations
Backup or preservation of websites is often not considered until after a catastrophic event has occurred. In the face of complete website loss, webmasters or concerned third parties have attempted to recover some of their websites from the Internet Archive. Still others have sought to retrieve missing resources from the caches of commercial search engines. Inspired by these post hoc reconstruction attempts, this dissertation introduces the concept of lazy preservation{ digital preservation performed as a result of the normal operations of the Web Infrastructure (web archives, search engines and caches). First, the Web Infrastructure (WI) is characterized by its preservation …
Factors Affecting Website Reconstruction From The Web Infrastructure, Frank Mccown, Norou Diawara, Michael L. Nelson
Factors Affecting Website Reconstruction From The Web Infrastructure, Frank Mccown, Norou Diawara, Michael L. Nelson
Computer Science Faculty Publications
When a website is suddenly lost without a backup, it may be reconstituted by probing web archives and search engine caches for missing content. In this paper we describe an experiment where we crawled and reconstructed 300 randomly selected websites on a weekly basis for 14 weeks. The reconstructions were performed using our web-repository crawler named Warrick which recovers missing resources from the Web Infrastructure (WI), the collective preservation effort of web archives and search engine caches. We examine several characteristics of the websites over time including birth rate, decay and age of resources. We evaluate the reconstructions when compared …
The Open Archives Initiative, Michael L. Nelson
The Open Archives Initiative, Michael L. Nelson
Computer Science Presentations
PDF of a powerpoint presentation from the Open Archives Initiative DRIADE ( Digital Repository of Information and Data for Evolution) Workshop, Durham, North Carolina, May 16-17, 2007. Also available on Slideshare.
Crate: A Simple Model For Self-Describing Web Resources, Joan A. Smith, Michael L. Nelson
Crate: A Simple Model For Self-Describing Web Resources, Joan A. Smith, Michael L. Nelson
Computer Science Faculty Publications
If not for the Internet Archive’s efforts to store periodic snapshots of the web, many sites would not have any preservation prospects at all. The barrier to entry is too high for everyday web sites, which may have skilled webmasters managing them, but which lack skilled archivists to preserve them. Digital preservation is not easy. One problem is the complexity of preservation models, which have specific meta-data and structural requirements. Another problem is the time and effort it takes to properly prepare digital resources for preservation in the chosen model. In this paper, we propose a simple preservation model called …
Brass: A Queueing Manager For Warrick, Frank Mccown, Amine Benjelloun, Michael L. Nelson
Brass: A Queueing Manager For Warrick, Frank Mccown, Amine Benjelloun, Michael L. Nelson
Computer Science Faculty Publications
When an individual loses their website and a backup can-not be found, they can download and run Warrick, a web-repository crawler which will recover their lost website by crawling the holdings of the Internet Archive and several search engine caches. Running Warrick locally requires some technical know-how, so we have created an on-line queueing system called Brass which simplifies the task of recovering lost websites. We discuss the technical aspects of recon-structing websites and the implementation of Brass. Our newly developed system allows anyone to recover a lost web-site with a few mouse clicks and allows us to track which …