Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Computer Science Theses & Dissertations

Web archiving

Articles 1 - 5 of 5

Full-Text Articles in Computer Engineering

To Relive The Web: A Framework For The Transformation And Archival Replay Of Web Pages, John Andrew Berlin Apr 2018

To Relive The Web: A Framework For The Transformation And Archival Replay Of Web Pages, John Andrew Berlin

Computer Science Theses & Dissertations

When replaying an archived web page (known as a memento), the fundamental expectation is that the page should be viewable and function exactly as it did at archival time. However, this expectation requires web archives to modify the page and its embedded resources, so that they no longer reference (link to) the original server(s) they were archived from but back to the archive. Although these modifications necessarily change the state of the representation, it is understood that without them the replay of mementos from the archive would not be possible. Unfortunately, because the replay of mementos and the modifications ...


Web Archive Services Framework For Tighter Integration Between The Past And Present Web, Ahmed Alsum Apr 2014

Web Archive Services Framework For Tighter Integration Between The Past And Present Web, Ahmed Alsum

Computer Science Theses & Dissertations

Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users' needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build ...


An Extensible Framework For Creating Personal Archives Of Web Resources Requiring Authentication, Matthew Ryan Kelly Jul 2012

An Extensible Framework For Creating Personal Archives Of Web Resources Requiring Authentication, Matthew Ryan Kelly

Computer Science Theses & Dissertations

The key factors for the success of the World Wide Web are its large size and the lack of a centralized control over its contents. In recent years, many advances have been made in preserving web content but much of this content (namely, social media content) was not archived, or still to this day is not being archived,for various reasons. Tools built to accomplish this frequently break because of the dynamic structure of social media websites. Because many social media websites exhibit a commonality in hierarchy of the content, it would be worthwhile to setup a means to reference ...


Using The Web Infrastructure For Real Time Recovery Of Missing Web Pages, Martin Klein Jul 2011

Using The Web Infrastructure For Real Time Recovery Of Missing Web Pages, Martin Klein

Computer Science Theses & Dissertations

Given the dynamic nature of the World Wide Web, missing web pages, or "404 Page not Found" responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or in part, content often moves from one URI to another and hence it just needs to be (re-)discovered. We evaluate several methods for a \justin- time" approach to web page preservation. We investigate the suitability of lexical signatures and web page titles to rediscover missing content. It is understood that web pages change over ...


Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown Oct 2007

Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown

Computer Science Theses & Dissertations

Backup or preservation of websites is often not considered until after a catastrophic event has occurred. In the face of complete website loss, webmasters or concerned third parties have attempted to recover some of their websites from the Internet Archive. Still others have sought to retrieve missing resources from the caches of commercial search engines. Inspired by these post hoc reconstruction attempts, this dissertation introduces the concept of lazy preservation{ digital preservation performed as a result of the normal operations of the Web Infrastructure (web archives, search engines and caches). First, the Web Infrastructure (WI) is characterized by its preservation ...