Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Physical Sciences and Mathematics

Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla Aug 2023

Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla

Computer Science Theses & Dissertations

The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs). Similarly, Software Heritage is working to archive public source code, but there is value in archiving the surrounding ephemera that provide important context to the code while maintaining their original URIs. In current implementations, source code …


Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti May 2023

Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti

Computer Science Theses & Dissertations

Social media has become one of the primary modes of communication in recent times, with popular platforms such as Facebook, Twitter, and Instagram leading the way. Despite its popularity, Instagram has not received as much attention in academic research compared to Facebook and Twitter, and its significant role in contemporary society is often overlooked. Web archives are making efforts to preserve social media content despite the challenges posed by the dynamic nature of these sites. The goal of our research is to facilitate the easy discovery of archived copies, or mementos, of all posts belonging to a specific Instagram account …


Mementomap: A Web Archive Profiling Framework For Efficient Memento Routing, Sawood Alam Dec 2020

Mementomap: A Web Archive Profiling Framework For Efficient Memento Routing, Sawood Alam

Computer Science Theses & Dissertations

With the proliferation of public web archives, it is becoming more important to better profile their contents, both to understand their immense holdings as well as to support routing of requests in Memento aggregators. A memento is a past version of a web page and a Memento aggregator is a tool or service that aggregates mementos from many different web archives. To save resources, the Memento aggregator should only poll the archives that are likely to have a copy of the requested Uniform Resource Identifier (URI). Using the Crawler Index (CDX), we generate profiles of the archives that summarize their …


Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala Aug 2020

Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala

Computer Science Theses & Dissertations

In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. …


A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban Aug 2020

A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban

Computer Science Theses & Dissertations

The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure that an archived resource has remained unaltered (i.e., fixed) since the time it was captured. Currently, end users do not have the ability to easily verify the fixity of content preserved in web archives. For instance, if a web page is archived in 1999 and replayed in 2019, how do we know that it has not been tampered with during those 20 years? In order for the users of web archives to verify that archived web …


Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly Jul 2019

Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly

Computer Science Theses & Dissertations

Web archives preserve the live Web for posterity, but the content on the Web one cares about may not be preserved. The ability to access this content in the future requires the assurance that those sites will continue to exist on the Web until the content is requested and that the content will remain accessible. It is ultimately the responsibility of the individual to preserve this content, but attempting to replay personally preserved pages segregates archived pages by individuals and organizations of personal, private, and public Web content. This is misrepresentative of the Web as it was. While the Memento …


Expanding The Usage Of Web Archives By Recommending Archived Webpages Using Only The Uri, Lulwah M. Alkwai Apr 2019

Expanding The Usage Of Web Archives By Recommending Archived Webpages Using Only The Uri, Lulwah M. Alkwai

Computer Science Theses & Dissertations

Web archives are a window to view past versions of webpages. When a user requests a webpage on the live Web, such as http://tripadvisor.com/where_to_t ravel/, the webpage may not be found, which results in an HyperText Transfer Protocol (HTTP) 404 response. The user then may search for the webpage in a Web archive, such as the Internet Archive. Unfortunately, if this page had never been archived, the user will not be able to view the page, nor will the user gain any information on other webpages that have similar content in the archive, such as the archived webpage http://classy-travel.net. Similarly, …


To Relive The Web: A Framework For The Transformation And Archival Replay Of Web Pages, John Andrew Berlin Apr 2018

To Relive The Web: A Framework For The Transformation And Archival Replay Of Web Pages, John Andrew Berlin

Computer Science Theses & Dissertations

When replaying an archived web page (known as a memento), the fundamental expectation is that the page should be viewable and function exactly as it did at archival time. However, this expectation requires web archives to modify the page and its embedded resources, so that they no longer reference (link to) the original server(s) they were archived from but back to the archive. Although these modifications necessarily change the state of the representation, it is understood that without them the replay of mementos from the archive would not be possible. Unfortunately, because the replay of mementos and the modifications made …


Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle Apr 2016

Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle

Computer Science Theses & Dissertations

Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival …


Web Archive Services Framework For Tighter Integration Between The Past And Present Web, Ahmed Alsum Apr 2014

Web Archive Services Framework For Tighter Integration Between The Past And Present Web, Ahmed Alsum

Computer Science Theses & Dissertations

Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users' needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build …


An Extensible Framework For Creating Personal Archives Of Web Resources Requiring Authentication, Matthew Ryan Kelly Jul 2012

An Extensible Framework For Creating Personal Archives Of Web Resources Requiring Authentication, Matthew Ryan Kelly

Computer Science Theses & Dissertations

The key factors for the success of the World Wide Web are its large size and the lack of a centralized control over its contents. In recent years, many advances have been made in preserving web content but much of this content (namely, social media content) was not archived, or still to this day is not being archived,for various reasons. Tools built to accomplish this frequently break because of the dynamic structure of social media websites. Because many social media websites exhibit a commonality in hierarchy of the content, it would be worthwhile to setup a means to reference this …


Using The Web Infrastructure For Real Time Recovery Of Missing Web Pages, Martin Klein Jul 2011

Using The Web Infrastructure For Real Time Recovery Of Missing Web Pages, Martin Klein

Computer Science Theses & Dissertations

Given the dynamic nature of the World Wide Web, missing web pages, or "404 Page not Found" responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or in part, content often moves from one URI to another and hence it just needs to be (re-)discovered. We evaluate several methods for a \justin- time" approach to web page preservation. We investigate the suitability of lexical signatures and web page titles to rediscover missing content. It is understood that web pages change over time …


Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown Oct 2007

Lazy Preservation: Reconstructing Websites From The Web Infrastructure, Frank Mccown

Computer Science Theses & Dissertations

Backup or preservation of websites is often not considered until after a catastrophic event has occurred. In the face of complete website loss, webmasters or concerned third parties have attempted to recover some of their websites from the Internet Archive. Still others have sought to retrieve missing resources from the caches of commercial search engines. Inspired by these post hoc reconstruction attempts, this dissertation introduces the concept of lazy preservation{ digital preservation performed as a result of the normal operations of the Web Infrastructure (web archives, search engines and caches). First, the Web Infrastructure (WI) is characterized by its preservation …