Open Access. Powered by Scholars. Published by Universities.®

Library and Information Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 63

Full-Text Articles in Library and Information Science

Robots Still Outnumber Humans In Web Archives In 2019, But Less Than In 2015 And 2012, Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2024

Robots Still Outnumber Humans In Web Archives In 2019, But Less Than In 2015 And 2012, Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

The significance of the web and the crucial role of web archives in its preservation highlight the necessity of understanding how users, both human and robot, access web archive content, and how best to satisfy this disparate needs of both types of users. To identify robots and humans in web archives and analyze their respective access patterns, we used the Internet Archive’s (IA) Wayback Machine access logs from 2012, 2015, and 2019, as well as Arquivo.pt’s (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based …


Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla Aug 2023

Assessing The Prevalence And Archival Rate Of Uris To Git Hosting Platforms In Scholarly Publications, Emily Escamilla

Computer Science Theses & Dissertations

The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs). Similarly, Software Heritage is working to archive public source code, but there is value in archiving the surrounding ephemera that provide important context to the code while maintaining their original URIs. In current implementations, source code …


Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti May 2023

Supporting Account-Based Queries For Archived Instagram Posts, Himarsha R. Jayanetti

Computer Science Theses & Dissertations

Social media has become one of the primary modes of communication in recent times, with popular platforms such as Facebook, Twitter, and Instagram leading the way. Despite its popularity, Instagram has not received as much attention in academic research compared to Facebook and Twitter, and its significant role in contemporary society is often overlooked. Web archives are making efforts to preserve social media content despite the challenges posed by the dynamic nature of these sites. The goal of our research is to facilitate the easy discovery of archived copies, or mementos, of all posts belonging to a specific Instagram account …


Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2023

Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the …


The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson Jan 2022

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configured to revisit the same original resource multiple times. This is incredibly useful for understanding an unfolding news story or the evolution of an organization. Unfortunately, over time, some of these original resources can go off-topic and no longer suit the purpose for which the collection was originally created. They can go off-topic due to web site …


This Old Vase: Ancient Art And Primary Source Instruction In The Archives, Laraann Canner Nov 2021

This Old Vase: Ancient Art And Primary Source Instruction In The Archives, Laraann Canner

Libraries Faculty & Staff Publications

No abstract provided.


Information Activism: A Queer History Of Lesbian Media Technologies, Dawn Betts-Green Jan 2021

Information Activism: A Queer History Of Lesbian Media Technologies, Dawn Betts-Green

STEMPS Faculty Publications

No abstract provided.


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


A Crisis Of Erasure: Transgender And Gender-Nonconforming Populations Navigating Breast Cancer Health Information, Curtis Shane Tenney, Karl J. Surkan, Lynette Hammond Gerido, Dawn Betts-Green Jan 2021

A Crisis Of Erasure: Transgender And Gender-Nonconforming Populations Navigating Breast Cancer Health Information, Curtis Shane Tenney, Karl J. Surkan, Lynette Hammond Gerido, Dawn Betts-Green

STEMPS Faculty Publications

In this paper, we use the topic of breast cancer as an example of health crisis erasure in both informational and institutional contexts, particularly within the transgender and gender-nonconforming population. Breast cancer health information conforms and defaults to conventional cultural associations with femininity, as is the case with pregnancy and other “single-sex” conditions (Surkan, 2015). Many health information and research practices normalize sexualities, pathologize non-normative gender (Drescher et al., 2012; Fish, 2008; Müller, 2018), and fail to recognize gender-nonconforming categories (Frohard‐Dourlent et al., 2017). Because breast cancer health information is sexually normalized, an information boundary exists for the LGBTQ+ community, …


Afterlives Of Indigenous Archives: Essays In Honor Of "The Occom Circle" [Book Review], Drew Lopenzina Nov 2020

Afterlives Of Indigenous Archives: Essays In Honor Of "The Occom Circle" [Book Review], Drew Lopenzina

English Faculty Publications

(First paragraph) Afterlives of Indigenous Archives takes its title from Anishinaabe author Gerald Vizenor who is, in turn, repurposing a quote from French theorist Jacques Derrida who, in his 1995 work, Archive Fever, referred to the archive as that which gestures toward “an excess of life,” something that “resists annihilation” (183). This excess, or “afterlife,” of the archive remains, for Vizenor at least, an unexpected location of Indigenous survivance—a site from which, despite every violent attempt to colonially contain and collapse Native presence, it is still possible to carry something forward from the ruins of representation. With this in mind, …


Legal And Technical Issues For Text And Data Mining In Greece, Maria Kanellopoulou - Botti, Marinos Papadopoulos, Christos Zampakolas, Paraskevi Ganatsiou May 2019

Legal And Technical Issues For Text And Data Mining In Greece, Maria Kanellopoulou - Botti, Marinos Papadopoulos, Christos Zampakolas, Paraskevi Ganatsiou

Computer Ethics - Philosophical Enquiry (CEPE) Proceedings

Web harvesting and archiving pertains to the processes of collecting from the web and archiving of works that reside on the Web. Web harvesting and archiving is one of the most attractive applications for libraries which plan ahead for their future operation. When works retrieved from the Web are turned into archived and documented material to be found in a library, the amount of works that can be found in said library can be far greater than the number of works harvested from the Web. The proposed participation in the 2019 CEPE Conference aims at presenting certain issues related to …


Shakespeare's Globe Archive: Theatres, Players & Performance, Rob Tench Jan 2019

Shakespeare's Globe Archive: Theatres, Players & Performance, Rob Tench

Libraries Faculty & Staff Publications

No abstract provided.


Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson Jan 2019

Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson

Computer Science Faculty Publications

[Summary] The authenticity, integrity, and provenance of resources we encounter on the web are increasingly in question. While many people are inured to the possibility of altered images, the easy accessibility of powerful software tools that synthesize audio and video will unleash a torrent of convincing “deepfakes” into our social discourse. Archives will no longer be monopolized by a countable number of institutions such as governments and publishers, but will become a competitive space filled with social engineers, propagandists, conspiracy theorists, and aspiring Hollywood directors. While the historical record has never been singular nor unmalleable, current technologies empower an unprecedented …


Subjectivity And Methodology In The Arch'i'Ve, Elizabeth J. Vincelette Jul 2018

Subjectivity And Methodology In The Arch'i'Ve, Elizabeth J. Vincelette

English Faculty Publications

This article explores methodologies from the fields of library archival science, human geography, composition and rhetoric, and established editorial practices in English studies. By elaborating on the role of a researcher’s subjectivity in archival creation, this work expands the conversation regarding methodology and archives, especially how archives present us with new ways of seeing and making narratives during the editorial decision-making involved in their creation. Writing about my own experience, I privilege the researcher’s point of view with a narrative about my construction of a digital archive. With archival research, we should promote the revelation of methods and methodology to …


Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2018

Client-Assisted Memento Aggregation Using The Prefer Header, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive's Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be …


Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle Jan 2018

Swimming In A Sea Of Javascript Or: How I Learned To Stop Worrying And Love High-Fidelity Replay, John A. Berlin, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[First paragraph] Preserving and replaying modern web pages in high-fidelity has become an increasingly difficult task due to the increased usage of JavaScript. Reliance on server-side rewriting alone results in live-leakage and or the inability to replay a page due to the preserved JavaScript performing an action not permissible from the archive. The current state-of-the-art high fidelity archival preservation and replay solutions rely on handcrafted client-side URL rewriting libraries specifically tailored for the archive, namely Webrecoder's and Pywb's wombat.js [12]. Web archives not utilizing client-side rewriting rely on server-side rewriting that misses URLs used in a manner not accounted for …


It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle Jan 2018

It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[Introduction] Checking fixity in web archives is performed to ensure archived resources, or mementos (denoted by URI-M) have remained unaltered since when they were captured. The final report of the PREMIS Working Group [2] defines information used for fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." The common technique for checking fixity is to generate a current hash value (i.e., a message digest or a checksum) for a file using a cryptographic hash function (e.g., SHA-256) and compare it to the hash value generated originally. If they have different hash …


205.3 The Many Shapes Of Archive-It, Shawn Jones, Michael L. Nelson, Alexander Nwala, Michele C. Weigle Jan 2018

205.3 The Many Shapes Of Archive-It, Shawn Jones, Michael L. Nelson, Alexander Nwala, Michele C. Weigle

Computer Science Faculty Publications

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government organizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources, or seeds, and creating their own web archive collections. We focus on the collections within Archive-It, a subscription service started by the Internet Archive in 2005 for the purpose of allowing organizations to create their own collections of archived web pages, or mementos. Understanding these collections could be done via their user-supplied metadata or via text analysis, but the metadata is …


Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson Jan 2017

Avoiding Zombies In Archival Replay Using Serviceworker, Sawood Alam, Mat Kelly, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

[First paragraph] A Composite Memento is an archived representation of a web page with all the page requisites such as images and stylesheets. All embedded resources have their own URIs, hence, they are archived independently. For a meaningful archival replay, it is important to load all the page requisites from the archive within the temporal neighborhood of the base HTML page. To achieve this goal, archival replay systems try to rewrite all the resource references to appropriate archived versions before serving HTML, CSS, or JS. However, an effective server-side URL rewriting is difficult when URLs are generated dynamically using JavaScript. …


Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany Jul 2016

Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany

Computer Science Theses & Dissertations

Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular …


Combining Heritrix And Phantomjs For Better Crawling Of Pages With Javascript, Justin F. Brunelle, Michele C. Weigle, Michael L. Nelson Apr 2016

Combining Heritrix And Phantomjs For Better Crawling Of Pages With Javascript, Justin F. Brunelle, Michele C. Weigle, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from the International Internet Preservation Consortium (IIPC) 2016 Conference in Reykjavik, Iceland, April 11, 2016. Also available on Slideshare.


Storytelling For Summarizing Collections In Web Archives, Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson Apr 2016

Storytelling For Summarizing Collections In Web Archives, Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from the Coalition for Networked Information (CNI) Spring 2016 Membership Meeting in San Antonio, Texas, April 5, 2016. Also available on Slideshare.


Why We Need Multiple Archives, Michael L. Nelson, Herbert Van De Sompel Apr 2016

Why We Need Multiple Archives, Michael L. Nelson, Herbert Van De Sompel

Computer Science Presentations

PDF of a powerpoint presentation from the Coalition for Networked Information (CNI) Spring 2016 Membership Meeting in San Antonio, Texas, April 3, 2016. Also available on Slideshare.


Combining Storytelling And Web Archives, Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson Nov 2015

Combining Storytelling And Web Archives, Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from an Old Dominion University Electrical & Computer Engineering (ECE) Department Colloquium, November 13, 2015. Also available on Slideshare.


Tools Managing Seed Urls (Detecting Off-Topic Pages), Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson Jun 2015

Tools Managing Seed Urls (Detecting Off-Topic Pages), Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from the Columbia University Web Archiving Collaboration: New Tools and Models Conference, in New York, New York, June 4-5, 2015. Also available on Slideshare.


Evaluating The Temporal Coherence Of Archived Pages, Scott G. Ainsworth, Michael L. Nelson, Herbert Van De Sompel Apr 2015

Evaluating The Temporal Coherence Of Archived Pages, Scott G. Ainsworth, Michael L. Nelson, Herbert Van De Sompel

Computer Science Presentations

PDF of a powerpoint presentation from the International Internet Preservation Consortium (IIPC) 2015 Conference at Stanford University, April 28, 2015. Also available on Slideshare.


Tools For Managing The Past Web, Michele C. Weigle Feb 2015

Tools For Managing The Past Web, Michele C. Weigle

Computer Science Presentations

PDF of a powerpoint presentation from an Old Dominion University - ECE Department Seminar, February 20, 2015. Also available on Slideshare.


Profiling Web Archives For Efficient Memento Query Routing, Sawood Alam, Michael L. Nelson, Herbert Van De Sompel, Lyudmila L. Balakireva, Harihar Shankar, David S. H. Rosenthal Jan 2015

Profiling Web Archives For Efficient Memento Query Routing, Sawood Alam, Michael L. Nelson, Herbert Van De Sompel, Lyudmila L. Balakireva, Harihar Shankar, David S. H. Rosenthal

Computer Science Faculty Publications

No abstract provided.


Tools For Managing The Past Web, Michele C. Weigle, Michael L. Nelson, Yasmin Alnoamany, Ahmed Alsum, Justin Brunelle, Mat Kelly, Hany Salaheldeen Nov 2014

Tools For Managing The Past Web, Michele C. Weigle, Michael L. Nelson, Yasmin Alnoamany, Ahmed Alsum, Justin Brunelle, Mat Kelly, Hany Salaheldeen

Computer Science Presentations

PDF of a powerpoint presentation from the Archive-It Partners Meeting in Montgomery, Alabama, November 18, 2014. Also available on Slideshare.


"Archive What I See Now" Bringing Institutional Web Archiving Tools To The Individual Researcher, Michele C. Weigle, Michael L. Nelson, Liza Potts Sep 2014

"Archive What I See Now" Bringing Institutional Web Archiving Tools To The Individual Researcher, Michele C. Weigle, Michael L. Nelson, Liza Potts

Computer Science Presentations

PDF of a powerpoint presentation from the 2014 National Endowment for the Humanities (NEH) Office of Digital Humanities (ODH) Project Directors' Meeting in Washington D. C., September 15, 2014. Also available form Slideshare.