Open Access. Powered by Scholars. Published by Universities.®
Library and Information Science Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Keyword
-
- Digital libraries (6)
- Web archiving (6)
- Framework (3)
- Information retrieval (3)
- Archives (2)
-
- Digital preservation (2)
- Memento (2)
- Social media (2)
- Storytelling (2)
- Summarization (2)
- Web archives (2)
- Archived web pages (1)
- Bucket brigade devices (1)
- Buckets (1)
- Cluster analysis (1)
- Collections (1)
- Computer programs (1)
- Document imaging systems (1)
- Fuzzy algorithms (1)
- GitHub (1)
- Instagram (1)
- Intelligent sampling (1)
- JavaScript (1)
- Machine learning (1)
- Micro-collection (1)
- Multimedia systems (1)
- News (1)
- Off-topic (1)
- Open source software (1)
- Optical character recognition (1)
Articles 1 - 14 of 14
Full-Text Articles in Library and Information Science
Surfacing Text Changes In Archived Webpages, Lesley Frew
Surfacing Text Changes In Archived Webpages, Lesley Frew
Computer Science Theses & Dissertations
Webpages change over time, and web archives hold copies of historical versions of webpages. Users of web archives, such as journalists, want to find and view changes on webpages over time. However, the current search interfaces for web archives do not adequately support this task. For the web archives that include a full-text search feature, multiple versions of the same webpage that match the search query are shown individually without enumerating changes, or are grouped together in a way that hides changes. We present a change text search engine that allows users to find changes in webpages. We describe the …
Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla
Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla
Computer Science Theses & Dissertations
The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs). Similarly, Software Heritage is working to archive public source code, but there is value in archiving the surrounding ephemera that provide important context to the code while maintaining their original URIs. In current implementations, source code …
Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti
Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti
Computer Science Theses & Dissertations
Social media has become one of the primary modes of communication in recent times, with popular platforms such as Facebook, Twitter, and Instagram leading the way. Despite its popularity, Instagram has not received as much attention in academic research compared to Facebook and Twitter, and its significant role in contemporary society is often overlooked. Web archives are making efforts to preserve social media content despite the challenges posed by the dynamic nature of these sites. The goal of our research is to facilitate the easy discovery of archived copies, or mementos, of all posts belonging to a specific Instagram account …
Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones
Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones
Computer Science Theses & Dissertations
Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding …
A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban
A Framework For Verifying The Fixity Of Archived Web Resources, Mohamed Aturban
Computer Science Theses & Dissertations
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure that an archived resource has remained unaltered (i.e., fixed) since the time it was captured. Currently, end users do not have the ability to easily verify the fixity of content preserved in web archives. For instance, if a web page is archived in 1999 and replayed in 2019, how do we know that it has not been tampered with during those 20 years? In order for the users of web archives to verify that archived web …
Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala
Bootstrapping Web Archive Collections From Micro-Collections In Social Media, Alexander C. Nwala
Computer Science Theses & Dissertations
In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. …
Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly
Aggregating Private And Public Web Archives Using The Mementity Framework, Matthew R. Kelly
Computer Science Theses & Dissertations
Web archives preserve the live Web for posterity, but the content on the Web one cares about may not be preserved. The ability to access this content in the future requires the assurance that those sites will continue to exist on the Web until the content is requested and that the content will remain accessible. It is ultimately the responsibility of the individual to preserve this content, but attempting to replay personally preserved pages segregates archived pages by individuals and organizations of personal, private, and public Web content. This is misrepresentative of the Web as it was. While the Memento …
Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany
Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany
Computer Science Theses & Dissertations
Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular …
Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle
Scripts In A Frame: A Framework For Archiving Deferred Representations, Justin F. Brunelle
Computer Science Theses & Dissertations
Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival …
Xpath-Based Template Language For Describing The Placement Of Metadata Within A Document, Vijay Kumar Musham
Xpath-Based Template Language For Describing The Placement Of Metadata Within A Document, Vijay Kumar Musham
Computer Science Theses & Dissertations
In the recent years, there has been a tremendous growth in Internet and online resources that had previously been restricted to paper archives. OCR (Optical Character Recognition) tools can be used for digitalizing an existing corpus and making it available online. A number of federal agencies, universities, laboratories, and companies are placing their collections online and making them searchable via metadata fields such as author, title, and publishing organization. Manually creating metadata for a large collection is an extremely time-consuming task, and is difficult to automate, particularly for collections consisting of documents with diverse layout and structure. The Extract project …
Recommender Systems For Multimedia Libraries: An Evaluation Of Different Models For Datamining Usage Data, Raquel Oliveira Araujo
Recommender Systems For Multimedia Libraries: An Evaluation Of Different Models For Datamining Usage Data, Raquel Oliveira Araujo
Computer Science Theses & Dissertations
Many recommender systems exist today to help users deal with the large growth in the amount of information available in the Internet. Most of these recommender systems use collaborative filtering or content-based techniques to present new material that would be of interest to a user. While these methods have proven to be effective, they have not been designed specifically for multimedia collections. In this study we present a new method to find recommendations that is not dependent on traditional Information Retrieval (IR) methods and compare it to algorithms that do rely on traditional IR methods. We evaluated these algorithms using …
Buckets: Smart Objects For Digital Libraries, Michael L. Nelson
Buckets: Smart Objects For Digital Libraries, Michael L. Nelson
Computer Science Theses & Dissertations
Discussion of digital libraries (DLs) is often dominated by the merits of various archives, repositories, search engines, search interfaces and database systems. While these technologies are necessary for information management, information content and information retrieval systems should progress on independent paths and each should make limited assumptions about the status or capabilities of the other. Information content is more important than the systems used for its storage and retrieval. Digital information should have the same long-term survivability prospects as traditional hardcopy information and should not be impacted by evolving search engine technologies or vendor vagaries in database management systems.
Digital …
Architectural Optimization Of Digital Libraries, Aileen O. Biser
Architectural Optimization Of Digital Libraries, Aileen O. Biser
Computer Science Theses & Dissertations
This work investigates performance and scaling issues relevant to large scale distributed digital libraries. Presently, performance and scaling studies focus on specific implementations of production or prototype digital libraries. Although useful information is gained to aid these designers and other researchers with insights to performance and scaling issues, the broader issues relevant to very large scale distributed libraries are not addressed. Specifically, no current studies look at the extreme or worst case possibilities in digital library implementations. A survey of digital library research issues is presented. Scaling and performance issues are mentioned frequently in the digital library literature but are …
Building Multi-Discipline, Multi-Format Digital Libraries Using Clusters And Buckets, Michael L. Nelson
Building Multi-Discipline, Multi-Format Digital Libraries Using Clusters And Buckets, Michael L. Nelson
Computer Science Theses & Dissertations
Our objective was to study the feasibility of extending the Dienst protocol to enable a multi-discipline, multi-format digital library. We implemented two new technologies: cluster functionality and publishing buckets. We have designed a possible implementation of clusters and buckets, and have prototyped some aspects of the resultant digital library.
Currently, digital libraries are segregated by the disciplines they serve ( computer science, aeronautics, etc.), and by the format of their holdings (reports, software, datasets, etc.). NCSTRL+ is a multi-discipline, multi-format digital library (DL) prototype created to explore the feasibility of the design and implementation issues involved with created a …