Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 6 of 6
Full-Text Articles in Computer Engineering
Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla
Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla
Computer Science Theses & Dissertations
The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs). Similarly, Software Heritage is working to archive public source code, but there is value in archiving the surrounding ephemera that provide important context to the code while maintaining their original URIs. In current implementations, source code …
A Framework For Web Object Self-Preservation, Charles L. Cartledge
A Framework For Web Object Self-Preservation, Charles L. Cartledge
Computer Science Theses & Dissertations
We propose and develop a framework based on emergent behavior principles for the long-term preservation of digital data using the web infrastructure. We present the development of the framework called unsupervised small-world (USW) which is at the nexus of emergent behavior, graph theory, and digital preservation. The USW algorithm creates graph based structures on the Web used for preservation of web objects (WOs). Emergent behavior activities, based on Craig Reynolds’ “boids” concept, are used to preserve WOs without the need for a central archiving authority. Graph theory is extended by developing an algorithm that incrementally creates small-world graphs. Graph theory …
Using The Web Infrastructure For Real Time Recovery Of Missing Web Pages, Martin Klein
Using The Web Infrastructure For Real Time Recovery Of Missing Web Pages, Martin Klein
Computer Science Theses & Dissertations
Given the dynamic nature of the World Wide Web, missing web pages, or "404 Page not Found" responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or in part, content often moves from one URI to another and hence it just needs to be (re-)discovered. We evaluate several methods for a \justin- time" approach to web page preservation. We investigate the suitability of lexical signatures and web page titles to rediscover missing content. It is understood that web pages change over time …
Integrating Preservation Functions Into The Web Server, Joan A. Smith
Integrating Preservation Functions Into The Web Server, Joan A. Smith
Computer Science Theses & Dissertations
Digital preservation of theWorldWideWeb poses unique challenges, different fromthe preservation issues facing professional Digital Libraries. The complete list of a website’s resources cannot be cited with confidence, and the descriptive metadata available for the resources is so minimal that it is sometimes insufficient for a browser to recognize. In short, the Web suffers from a counting problem and a representation problem. Refreshing the bits, migrating from an obsolete file format to a newer format, and other classic digital preservation problems also affect the Web. As digital collections devise solutions to these problems, the Web will also benefit. But the core …
Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown
Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown
Computer Science Theses & Dissertations
Backup or preservation of websites is often not considered until after a catastrophic event has occurred. In the face of complete website loss, webmasters or concerned third parties have attempted to recover some of their websites from the Internet Archive. Still others have sought to retrieve missing resources from the caches of commercial search engines. Inspired by these post hoc reconstruction attempts, this dissertation introduces the concept of lazy preservation{ digital preservation performed as a result of the normal operations of the Web Infrastructure (web archives, search engines and caches). First, the Web Infrastructure (WI) is characterized by its preservation …
Opal: In Vivo Based Preservation Framework For Locating Lost Web Pages, Terry L. Harrison
Opal: In Vivo Based Preservation Framework For Locating Lost Web Pages, Terry L. Harrison
Computer Science Theses & Dissertations
We present Opal, a framework for interactively locating missing web pages (http status code 404). Opal is an example of "in vivo" preservation: harnessing the collective behavior of web archives, commercial search engines, and research projects for the purpose of preservation. Opal servers learn from their experiences and are able to share their knowledge with other Opal servers using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Using cached copies that can be found on the web, Opal creates lexical signatures which are then used to search for similar versions of the web page. Using the OAI-PMH to facilitate …