Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Library and Information Science

Surfacing Text Changes In Archived Webpages, Lesley Frew Jul 2024

Surfacing Text Changes In Archived Webpages, Lesley Frew

Computer Science Theses & Dissertations

Webpages change over time, and web archives hold copies of historical versions of webpages. Users of web archives, such as journalists, want to find and view changes on webpages over time. However, the current search interfaces for web archives do not adequately support this task. For the web archives that include a full-text search feature, multiple versions of the same webpage that match the search query are shown individually without enumerating changes, or are grouped together in a way that hides changes. We present a change text search engine that allows users to find changes in webpages. We describe the …


Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla Aug 2023

Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla

Computer Science Theses & Dissertations

The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs). Similarly, Software Heritage is working to archive public source code, but there is value in archiving the surrounding ephemera that provide important context to the code while maintaining their original URIs. In current implementations, source code …


Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti May 2023

Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti

Computer Science Theses & Dissertations

Social media has become one of the primary modes of communication in recent times, with popular platforms such as Facebook, Twitter, and Instagram leading the way. Despite its popularity, Instagram has not received as much attention in academic research compared to Facebook and Twitter, and its significant role in contemporary society is often overlooked. Web archives are making efforts to preserve social media content despite the challenges posed by the dynamic nature of these sites. The goal of our research is to facilitate the easy discovery of archived copies, or mementos, of all posts belonging to a specific Instagram account …


Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones Jul 2021

Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones

Computer Science Theses & Dissertations

Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding …


A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban Aug 2020

A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban

Computer Science Theses & Dissertations

The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure that an archived resource has remained unaltered (i.e., fixed) since the time it was captured. Currently, end users do not have the ability to easily verify the fixity of content preserved in web archives. For instance, if a web page is archived in 1999 and replayed in 2019, how do we know that it has not been tampered with during those 20 years? In order for the users of web archives to verify that archived web …


Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala Aug 2020

Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala

Computer Science Theses & Dissertations

In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. …


Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly Jul 2019

Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly

Computer Science Theses & Dissertations

Web archives preserve the live Web for posterity, but the content on the Web one cares about may not be preserved. The ability to access this content in the future requires the assurance that those sites will continue to exist on the Web until the content is requested and that the content will remain accessible. It is ultimately the responsibility of the individual to preserve this content, but attempting to replay personally preserved pages segregates archived pages by individuals and organizations of personal, private, and public Web content. This is misrepresentative of the Web as it was. While the Memento …


Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany Jul 2016

Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany

Computer Science Theses & Dissertations

Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular …


Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle Apr 2016

Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle

Computer Science Theses & Dissertations

Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival …


Xpath-Based Template Language For Describing The Placement Of Metadata Within A Document, Vijay Kumar Musham Dec 2010

Xpath-Based Template Language For Describing The Placement Of Metadata Within A Document, Vijay Kumar Musham

Computer Science Theses & Dissertations

In the recent years, there has been a tremendous growth in Internet and online resources that had previously been restricted to paper archives. OCR (Optical Character Recognition) tools can be used for digitalizing an existing corpus and making it available online. A number of federal agencies, universities, laboratories, and companies are placing their collections online and making them searchable via metadata fields such as author, title, and publishing organization. Manually creating metadata for a large collection is an extremely time-consuming task, and is difficult to automate, particularly for collections consisting of documents with diverse layout and structure. The Extract project …


Recommender Systems For Multimedia Libraries: An Evaluation Of Different Models For Datamining Usage Data, Raquel Oliveira Araujo Dec 2004

Recommender Systems For Multimedia Libraries: An Evaluation Of Different Models For Datamining Usage Data, Raquel Oliveira Araujo

Computer Science Theses & Dissertations

Many recommender systems exist today to help users deal with the large growth in the amount of information available in the Internet. Most of these recommender systems use collaborative filtering or content-based techniques to present new material that would be of interest to a user. While these methods have proven to be effective, they have not been designed specifically for multimedia collections. In this study we present a new method to find recommendations that is not dependent on traditional Information Retrieval (IR) methods and compare it to algorithms that do rely on traditional IR methods. We evaluated these algorithms using …


Buckets: Smart Objects For Digital Libraries, Michael L. Nelson Jul 2000

Buckets: Smart Objects For Digital Libraries, Michael L. Nelson

Computer Science Theses & Dissertations

Discussion of digital libraries (DLs) is often dominated by the merits of various archives, repositories, search engines, search interfaces and database systems. While these technologies are necessary for information management, information content and information retrieval systems should progress on independent paths and each should make limited assumptions about the status or capabilities of the other. Information content is more important than the systems used for its storage and retrieval. Digital information should have the same long-term survivability prospects as traditional hardcopy information and should not be impacted by evolving search engine technologies or vendor vagaries in database management systems.

Digital …


Architectural Optimization Of Digital Libraries, Aileen O. Biser Aug 1998

Architectural Optimization Of Digital Libraries, Aileen O. Biser

Computer Science Theses & Dissertations

This work investigates performance and scaling issues relevant to large scale distributed digital libraries. Presently, performance and scaling studies focus on specific implementations of production or prototype digital libraries. Although useful information is gained to aid these designers and other researchers with insights to performance and scaling issues, the broader issues relevant to very large scale distributed libraries are not addressed. Specifically, no current studies look at the extreme or worst case possibilities in digital library implementations. A survey of digital library research issues is presented. Scaling and performance issues are mentioned frequently in the digital library literature but are …


Building Multi-Discipline, Multi-Format Digital Libraries Using Clusters And Buckets, Michael L. Nelson Aug 1997

Building Multi-Discipline, Multi-Format Digital Libraries Using Clusters And Buckets, Michael L. Nelson

Computer Science Theses & Dissertations

Our objective was to study the feasibility of extending the Dienst protocol to enable a multi-discipline, multi-format digital library. We implemented two new technologies: cluster functionality and publishing buckets. We have designed a possible implementation of clusters and buckets, and have prototyped some aspects of the resultant digital library.

Currently, digital libraries are segregated by the disciplines they serve ( computer science, aeronautics, etc.), and by the format of their holdings (reports, software, datasets, etc.). NCSTRL+ is a multi-discipline, multi-format digital library (DL) prototype created to explore the feasibility of the design and implementation issues involved with created a …