Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 26 of 26

Full-Text Articles in Physical Sciences and Mathematics

Extracting Information From Twitter Screenshots, Tarannum Zaki, Michael L. Nelson, Michele C. Weigle Apr 2023

Extracting Information From Twitter Screenshots, Tarannum Zaki, Michael L. Nelson, Michele C. Weigle

Modeling, Simulation and Visualization Student Capstone Conference

Screenshots are prevalent on social media as a common approach for information sharing. Users rarely verify before sharing screenshots whether they are fake or real. Information sharing through fake screenshots can be highly responsible for misinformation and disinformation spread on social media. There are services of the live web and web archives that could be used to validate the content of a screenshot. We are going to develop a tool that would automatically provide a probability whether a screenshot is fake by using the services of the live web and web archives.


The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson Jan 2022

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configured to revisit the same original resource multiple times. This is incredibly useful for understanding an unfolding news story or the evolution of an organization. Unfortunately, over time, some of these original resources can go off-topic and no longer suit the purpose for which the collection was originally created. They can go off-topic due to web site …


Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones Jul 2021

Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones

Computer Science Theses & Dissertations

Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding …


Shari- An Integration Of Tools To Visualize The Story Of The Day, Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson Jan 2020

Shari- An Integration Of Tools To Visualize The Story Of The Day, Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Tools such as google news and flipboard exist to convey daily news, but what about the news of the past? In this paper, we describe how to combine several existing tools and web archive holdings to convey the “biggest story” for a given date in the past. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by Story-Graph in web archives. Hypercane analyzes these URLs to identify the most common terms, entities, and highest quality images for social media storytelling. Raintale then takes the output of these tools to produce a …


Mementoembed And Raintale For Web Archive Storytelling, Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson Jan 2020

Mementoembed And Raintale For Web Archive Storytelling, Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display this sample to drive visitors to their collection? Search engines and social media platforms often represent web pages as cards consisting of text snippets, titles, and images. Web storytelling is a popular method for grouping these cards in order to summarize a topic. Unfortunately, social media platforms are not archive-aware and fail to consistently create a good experience for …


Mementomap: An Archive Profile Dissemination Framework, Sawood Alam, Michele C. Weigle, Michael L. Nelson Jun 2019

Mementomap: An Archive Profile Dissemination Framework, Sawood Alam, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

We introduce MementoMap, a framework to express and disseminate holdings of web archives (archive profiles) by themselves or third parties. The framework allows arbitrary, flexible, and dynamic levels of details in its entries that fit the needs of archives of different scales. This enables Memento aggregators to significantly reduce wasted traffic to web archives.


Impact Of Http Cookie Violations In Web Archives, Sawood Alam, Michele C. Weigle, Michael L. Nelson Jun 2019

Impact Of Http Cookie Violations In Web Archives, Sawood Alam, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.


Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson Jan 2019

Web Archives At The Nexus Of Good Fakes And Flawed Originals, Michael L. Nelson

Computer Science Faculty Publications

[Summary] The authenticity, integrity, and provenance of resources we encounter on the web are increasingly in question. While many people are inured to the possibility of altered images, the easy accessibility of powerful software tools that synthesize audio and video will unleash a torrent of convincing “deepfakes” into our social discourse. Archives will no longer be monopolized by a countable number of institutions such as governments and publishers, but will become a competitive space filled with social engineers, propagandists, conspiracy theorists, and aspiring Hollywood directors. While the historical record has never been singular nor unmalleable, current technologies empower an unprecedented …


It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle Jan 2018

It Is Hard To Compute Fixity On Archived Web Pages, Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

[Introduction] Checking fixity in web archives is performed to ensure archived resources, or mementos (denoted by URI-M) have remained unaltered since when they were captured. The final report of the PREMIS Working Group [2] defines information used for fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." The common technique for checking fixity is to generate a current hash value (i.e., a message digest or a checksum) for a file using a cryptographic hash function (e.g., SHA-256) and compare it to the hash value generated originally. If they have different hash …


Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany Jul 2016

Using Web Archives To Enrich The Live Web Experience Through Storytelling, Yasmin Alnoamany

Computer Science Theses & Dissertations

Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular …


Tools Managing Seed Urls (Detecting Off-Topic Pages), Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson Jun 2015

Tools Managing Seed Urls (Detecting Off-Topic Pages), Yasmin Alnoamany, Michele C. Weigle, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from the Columbia University Web Archiving Collaboration: New Tools and Models Conference, in New York, New York, June 4-5, 2015. Also available on Slideshare.


Tools For Managing The Past Web, Michele C. Weigle, Michael L. Nelson, Yasmin Alnoamany, Ahmed Alsum, Justin Brunelle, Mat Kelly, Hany Salaheldeen Nov 2014

Tools For Managing The Past Web, Michele C. Weigle, Michael L. Nelson, Yasmin Alnoamany, Ahmed Alsum, Justin Brunelle, Mat Kelly, Hany Salaheldeen

Computer Science Presentations

PDF of a powerpoint presentation from the Archive-It Partners Meeting in Montgomery, Alabama, November 18, 2014. Also available on Slideshare.


"Archive What I See Now" Bringing Institutional Web Archiving Tools To The Individual Researcher, Michele C. Weigle, Michael L. Nelson, Liza Potts Sep 2014

"Archive What I See Now" Bringing Institutional Web Archiving Tools To The Individual Researcher, Michele C. Weigle, Michael L. Nelson, Liza Potts

Computer Science Presentations

PDF of a powerpoint presentation from the 2014 National Endowment for the Humanities (NEH) Office of Digital Humanities (ODH) Project Directors' Meeting in Washington D. C., September 15, 2014. Also available form Slideshare.


Bits Of Research, Michele C. Weigle Jun 2014

Bits Of Research, Michele C. Weigle

Computer Science Presentations

PDF of a powerpoint presentation that provides an overview of digital preservation, web archiving, and information visualization research; dated June 26, 2014. Also available on Slideshare.


Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson Jan 2014

Moved But Not Gone: An Evaluation Of Real-Time Methods For Discovering Replacement Web Pages, Martin Klein, Michael L. Nelson

Computer Science Faculty Publications

Inaccessible Web pages and 404 “Page Not Found” responses are a common Web phenomenon and a detriment to the user’s browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, …


Telling Stories With Web Archives, Michele C. Weigle Nov 2013

Telling Stories With Web Archives, Michele C. Weigle

Computer Science Presentations

PDF of a powerpoint presentation from the Southeast Women in Computing Conference in Lake Guntersville State Park, Alabama, November 16, 2013. Also available on Slideshare.


A Method For Identifying Personalized Representations In Web Archives, Mat Kelly, Justin F. Brunelle, Michele C. Weigle, Michael L. Nelson Jan 2013

A Method For Identifying Personalized Representations In Web Archives, Mat Kelly, Justin F. Brunelle, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web resources are becoming increasingly personalized — two different users clicking on the same link at the same time can see content customized for each individual user. These changes result in multiple representations of a resource that cannot be canonicalized in Web archives. We identify characteristics of this problem by presenting a potential solution to generalize personalized representations in archives. We also present our proof-of-concept prototype that analyzes WARC (Web ARChive) format files, inserts metadata establishing relationships, and provides archive users the ability to navigate on the additional dimension of environment variables in a modified Wayback Machine.


Visualizing Digital Collections At Archive-It, Kalpesh Padia Jul 2012

Visualizing Digital Collections At Archive-It, Kalpesh Padia

Computer Science Theses & Dissertations

Archive-It, a subscription service from the Internet Archive, allows users to create,maintain, and view digital collections of web resources. The current interface of Archive-It is largely text-based, supporting drill-down navigation using lists of URIs.While this interface provides good searching capabilities, it is not efficient for browsing. In the absence of keywords, a user has to spend large amount of time trying to locate a web page of interest. In order to provide a better visual experience to the user, we have studied the underlying characteristics of Archive-It collections and implemented six different visualizations (treemap, time cloud, bubble chart, image plot, …


Why Care About The Past?, Michael L. Nelson, Michele C. Weigle Jan 2012

Why Care About The Past?, Michael L. Nelson, Michele C. Weigle

Computer Science Presentations

A set of slides used in various presentations by the authors to show that replaying an experience via archived web pages is more compelling than reading a summary of the event. Also available on Slideshare.


My Point Of View, Michael L. Nelson Sep 2010

My Point Of View, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from the Web Archiving Cooperative (WAC) Meeting, Stanford University, September 9, 2010. Also available on Slideshare.


(Re-) Discovering Lost Web Pages, Martin Klein, Michael L. Nelson Oct 2009

(Re-) Discovering Lost Web Pages, Martin Klein, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from a Mathematics & Computer Science Seminar at Emory University, Atlanta, Georgia, October 2, 2009. Also available on Slideshare.


Synchronicity: Just-In-Time Discovery Of Lost Web Pages, Martin Klein, Michael L. Nelson Jun 2009

Synchronicity: Just-In-Time Discovery Of Lost Web Pages, Martin Klein, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from the National Digital Information Infrastructure and Preservation Program (NDIIPP) Partners Meeting, Washington D.C., June 24-25, 2009. Also available on Slideshare.


Can't Find Your 404s?, Martin Klein, Frank Mccown, Joan Smith, Michael L. Nelson Mar 2009

Can't Find Your 404s?, Martin Klein, Frank Mccown, Joan Smith, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation at the Santa Fe Complex, Santa Fe, New Mexico, March 13, 2009. Also available on Slideshare.


Tools For A Preservation-Ready Web, Joan A. Smith, Michael L. Nelson Jul 2008

Tools For A Preservation-Ready Web, Joan A. Smith, Michael L. Nelson

Computer Science Presentations

PDF of a powerpoint presentation from the National Digital Information Infrastructure and Preservation Program (NDIIPP) Partners Meeting, Washington D.C., July 9, 2008. Also available on Slideshare.


Opal: In Vivo Based Preservation Framework For Locating Lost Web Pages, Terry L. Harrison Jul 2005

Opal: In Vivo Based Preservation Framework For Locating Lost Web Pages, Terry L. Harrison

Computer Science Theses & Dissertations

We present Opal, a framework for interactively locating missing web pages (http status code 404). Opal is an example of "in vivo" preservation: harnessing the collective behavior of web archives, commercial search engines, and research projects for the purpose of preservation. Opal servers learn from their experiences and are able to share their knowledge with other Opal servers using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Using cached copies that can be found on the web, Opal creates lexical signatures which are then used to search for similar versions of the web page. Using the OAI-PMH to facilitate …


Lessons Learned With Arc, An Oai-Pmh Service Provider, Xiaoming Liu, Kurt Maly, Michael L. Nelson Jan 2005

Lessons Learned With Arc, An Oai-Pmh Service Provider, Xiaoming Liu, Kurt Maly, Michael L. Nelson

Computer Science Faculty Publications

Web-based digital libraries have historically been built in isolation utilizing different technologies, protocols, and metadata. These differences hindered the development of digital library services that enable users to discover information from multiple libraries through a single unified interface. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a major, international effort to address technical interoperability among distributed repositories. Arc debuted in 2000 as the first end-user OAI-PMH service provider. Since that time, Arc has grown to include nearly 7,000,000 metadata records. Arc has been deployed in a number of environments and has served as the basis for many other …