IIPC TSS Webinar: Dark and Stormy Archives & Bloom Filters for Web Archives’ Holdings
The IIPC Technical Speaker Series (TSS) facilitates knowledge sharing and foster conversations and collaborations among IIPC members around web archiving technical work. During this webinar, we will provide an update on work funded by two different grants from IIPC’s Discretionary Funding Programme (DFP):
- Improving the Dark and Stormy Archives Framework by Summarizing the Collections of the National Library of Australia, a collaboration between Old Dominion University Department (ODU) of Computer Science, Los Alamos National Laboratory (LANL) Research Library & National Library of Australia, and
- Developing Bloom Filters for Web Archives’ Holdings, a collaboration between Los Alamos National Laboratory (LANL) Research Library and the National and University Library in Zagreb (NSK).
In the first part of this session, Martin Klein, Luda Balakireva (LANL) and Karolina Holub (NSK) will give an overview of the Bloom Filter project. The goal of this effort is to provide web archives with a technical solution to convey their archival holdings without actively communicating URLs of web resources they have archived. Bloom Filters operate with hashed URLs, enabling a simple, fast, and reliable query service to inquire whether archival copies (mementos) of particular web resources exist in an archive. During this presentation, the PIs will give an overview of first observations testing various Bloom Filter implementations, demo a pilot implementation of a query service, and discuss a potential solution to cover the entire Croation Web Archive index for the foreseeable future. A further objective of this presentation is to solicit feedback on how the project can support other web archives that are interested in testing this technology for internal/external use.
In the second part, Shawn Jones (LANL) and Himarsha Jayanetti (ODU) will walk through improvements to the Dark and Stormy Archives (DSA) toolkit. The DSA toolkit provides storytelling capability with web archives. With it, a user can discover archived pages, also known as mementos, and then visualize them as a social media story. This talk will focus on Hypercane and Raintale. Hypercane is a software package for selecting exemplar mementos from a web archive collection, primarily for summarization and storytelling. Raintale visualizes a set of mementos as a story. Raintale provides templating that allows users to control the look and feel of their story, distributing it in a format such as HTML or through a series of social media posts. This talk will highlight current updates to the DSA toolkit developed while piloting it with the National Library of Australia (NLA).
- Martin Klein, Luda Balakireva, and Karolina Holub: Bloom Filter overview
- Shawn M. Jones: DSA overview, Hypercane presentation
- Himarsha Jayanetti: Raintale presentation
The presentations will be followed by a Q&A session chaired by Lauren Ko, University of North Texas Libraries.