Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Library and Information Science

Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones Jul 2021

Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones

Computer Science Theses & Dissertations

Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding …


Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala Aug 2020

Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala

Computer Science Theses & Dissertations

In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. …


A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban Aug 2020

A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban

Computer Science Theses & Dissertations

The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure that an archived resource has remained unaltered (i.e., fixed) since the time it was captured. Currently, end users do not have the ability to easily verify the fixity of content preserved in web archives. For instance, if a web page is archived in 1999 and replayed in 2019, how do we know that it has not been tampered with during those 20 years? In order for the users of web archives to verify that archived web …


Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly Jul 2019

Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly

Computer Science Theses & Dissertations

Web archives preserve the live Web for posterity, but the content on the Web one cares about may not be preserved. The ability to access this content in the future requires the assurance that those sites will continue to exist on the Web until the content is requested and that the content will remain accessible. It is ultimately the responsibility of the individual to preserve this content, but attempting to replay personally preserved pages segregates archived pages by individuals and organizations of personal, private, and public Web content. This is misrepresentative of the Web as it was. While the Memento …


Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany Jul 2016

Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany

Computer Science Theses & Dissertations

Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular …


Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle Apr 2016

Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle

Computer Science Theses & Dissertations

Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival …


Buckets: Smart Objects For Digital Libraries, Michael L. Nelson Jul 2000

Buckets: Smart Objects For Digital Libraries, Michael L. Nelson

Computer Science Theses & Dissertations

Discussion of digital libraries (DLs) is often dominated by the merits of various archives, repositories, search engines, search interfaces and database systems. While these technologies are necessary for information management, information content and information retrieval systems should progress on independent paths and each should make limited assumptions about the status or capabilities of the other. Information content is more important than the systems used for its storage and retrieval. Digital information should have the same long-term survivability prospects as traditional hardcopy information and should not be impacted by evolving search engine technologies or vendor vagaries in database management systems.

Digital …