The IIPC funds technical, educational and curatorial projects based on the goals outlined in the Strategic Action Plan. Projects can be submitted by Working Groups and Portfolios or, for dedicated programs, through the Request for Proposals. The consortium also collaborates on research and development projects by sharing data and testing tools. Task forces are formed to study and make recommendations on specific issues or problems. In the annual schedule, the IIPC also supports the General Assembly and Web Archiving Conference, online event series, training workshops, hackathons, and other events by providing direct funding or co-organizing them with members.
CURRENT PROJECTS
COMPLETED PROJECTS
- Discretionary Funding Program (DFP):
- Tools Development Projects 2020-2023
- Membership Engagement Projects 2017-2024
- Training Working Group Projects 2017-2020
- 2009-2017
IIPC TOOLS DEVELOPMENT PROJECTS
WEB ARCHIVING TOOLS MAINTENANCE (2025)
Project Coordinators: Tools Development Portfolio Leads & IIPC SPO
Funding: 20,000 USD
This project focuses on addressing issues in web archiving tools that are of priority to IIPC members. The budget is primarily for maintenance in the playback category but also for addressing issues pertaining to capture tools, which will become clear as draft work packages are finalized.
PYWB WORK PACKAGES (2024-2025)
Project Coordinators: Tools Development Portfolio Leads & IIPC SPO
Project developers: Webrecorder.net
Funding: 10,000 USD
Work Packages for the maintenance of pywb, an open-source tool developed by Webrecorder.
Deliverables:
- WP 1: Integration of client-side replay
- WP 2: Implementation of High-Priority Bug Fixes
- Follow-up bug fixes based on the deliverables of WP 1 and WP 2, or additional maintenance
IIPC WORKING GROUP PROJECTS
COLLABORATIVE COLLECTIONS
Project coordinators: Nicola Bingham, British Library and Shereen Tay, National Library Board Singapore, Content Development Group (CDG) Co-chairs
The IIPC has supported the creation of collaborative collections since 2015, and they are now accessible through Archive-It and Biblioteca Alexandrina. Current collections include new crawling for the ongoing Intergovernmental Organizations and the War in Ukraine. The CDG will confirm their plans for other 2025 collections at the General Assembly.
To subscribe to the CDG mailing list, please email communications@iipc.simplelists.comIIPC SPEAKER SERIES
IIPC TECHNICAL SPEAKER SERIES
Project coordinators: Jefferson Bailey, Internet Archive and Olga Holownia, IIPC SPO
The IIPC Technical Speaker Series (TSS) facilitates knowledge sharing and fosters conversations and collaborations among IIPC members around web archiving technical work. The IIPC members present 30-60 minute online webinars on new, recent, or innovative technical projects within their organisations. The series is not intended to be training or workshop-oriented but instead, provide an opportunity for members to disseminate information and showcase their work on internal technical projects that have relevance to the broader IIPC community. Speakers are selected through direct recruitment and a forthcoming open call for proposals.
IIPC RESEARCH SPEAKER SERIES
Project coordinator: IIPC SPO
The IIPC Research Speaker Series (RSS) focuses on the research use of web archives. The webinars feature presentations of use cases, current collaborative projects and new tools for researchers.
DISCRETIONARY FUNDING PROGRAMME
2021-2022
GAME WALKTHROUGHS AND WEB ARCHIVING
Project lead: Michael L. Nelson, Old Dominion University Department of Computer Science
Project partners: Martin Klein, Los Alamos National Laboratory (LANL) Research Library
Funding: 10,000 USD
The goal of this project is to explore possible synergy between gaming concepts, platforms, and technologies and those of web archiving.
2020-2021
DEVELOPING BLOOM FILTERS FOR WEB ARCHIVES’ HOLDINGS
Project lead: Martin Klein, Los Alamos National Laboratory (LANL) Research Library
Project partners: National and University Library in Zagreb (NSK)
Funding: 24,741 USD
The aim of the project is to develop a framework for web archives to create Bloom filters based on their holdings of archived web resources. A Bloom filter can be thought of as a sitemap for web archives, listing all (or a subset of) URLs of which an archive has one or more archival copies.
IMPROVING THE DARK AND STORMY ARCHIVES FRAMEWORK BY SUMMARIZING THE COLLECTIONS OF THE NATIONAL LIBRARY OF AUSTRALIA
Project lead: Michael L. Nelson, Old Dominion University Department of Computer Science
Project partners: Los Alamos National Laboratory Research Library & National Library of Australia
Funding: 50,000 USD
The Dark and Stormy Archives (DSA) project provides storytelling solutions to improve the understanding of web archive collections. Our goal is to provide a summary of a collection in the form of social media storytelling that describes a web archive collection sufficiently for a user to decide if that collection will likely contain pages of interest.
2019-2020
ARCHIVES UNLEASHED DATATHON AT THE BNF (CANCELED)
Lead Institution: Bibliothèque nationale de France (BnF)
Project partners (IIPC): KBR – Royal Library of Belgium and National Library of Luxembourg (BnL)
Project partner: Archives Unleashed Project
Funding: 6,000 USD
The aim of the project is to promote the use of web archive collections among researchers. To achieve this goal, the partner institutions will organise a datathon on web archive collections coming from francophone national libraries with a legal deposit mission. Datathon will be led by Archives Unleashed and will use the datasets from BnF, KBR and BnL.
ASKING QUESTIONS WITH WEB ARCHIVES – INTRODUCTORY NOTEBOOKS FOR HISTORIANS
Project lead (IIPC member institution): Andrew Jackson, British Library
Project co-lead & developer: Tim Sherratt, University of Canberra
Project partners: National Library of Australia & National Library of New Zealand
Funding: 3,500 USD
This project aims to create a set of Jupyter notebooks that will demonstrate how specific historical research questions can be explored by analysing data from web archives. The notebooks will be targeted at researchers who have limited understanding of, or interest in, the technology of web archives, but want to do more than simply browse snapshots.
LINKGATE: CORE FUNCTIONALITY AND FUTURE USE CASES
Project lead: Youssef Eldakar, Bibliotheca Alexandrina
Project partner: National Library of New Zealand
Funding: 24,439 USD
This projects aims to developing the core functionality of a scalable link visualization environment and documenting potential research use cases within the domain of web archiving for future development. While tools such as Gephi exist for visualizing linked data, they lack the ability to operate on data that goes beyond the typical capacity of a standalone computing device. This new link visualization environment would operate on data kept in a remote data store, enabling it to scale up to the magnitude of a web archive with tens of billions of web resources.
COMPLETED PROJECTS
BROWSER-BASED CRAWLING SYSTEM FOR ALL (2022-2023)
Project Coordinators: Tools Development Portfolio Leads & IIPC SPO
Project leads: Anders Klindt Myrvoll, Royal Danish Library; Andrew Jackson, British Library; Ben O’Brien, National Library of New Zealand; Lauren Ko, University of North Texas
Project developer: Ilya Kreymer, Webrecorder.net
Funding: 30,000 USD
Development of the “User-Friendly High Fidelity Browser-Based Crawling System for All”, a flexible, browser-based high fidelity crawling system driven by a full-featured user interface and accessible to curators and web archivists at any institution. The crawling system will focus on enabling the capture of complex, dynamic websites.
SUPPORT FOR TRANSITIONING TO PYWB (2020-2021)
Project Coordinators: Tools Development Portfolio Leads & IIPC SPO
Project developer: Ilya Kreymer, Webrecorder.net
Funding: 30,000 EUR
The goal of this project was to assist institutions transitioning to pywb and to provide documentation to facilitate future migrations. This was accomplished in three phases: 1) providing support for migrating from common replay scenarios in OpenWayback to pywb, 2) developing APIs to support multilingualism, modularity around index and WARC store solutions, and various access controls, and 3) providing guides on styling and embedding pywb along with banner navigation enhancements and a calendar display.
MEMBERSHIP SURVEY, 2023-2024
Project coordinators: Abbie Grotke, Library of Congress (Membership Engagement Portfolio Lead), Olga Holownia (IIPC SPO); Karolina Holub, National and University Library in Zagreb and Vladimir Tybin, National Library of France
Funding: 300 USD
MEMBERSHIP SURVEY, 2017-2018
Project coordinators: Barbara Sierman, National Library of the Netherlands (Membership Engagement Portfolio Lead); Emmanuelle Bermès, National Library of France; Abbie Grotke, Library of Congress; Aija Vahtola, National Library of Finland; Peter Webster, Webster Research & Consulting
Funding: 3,475 USD
The Membership Engagement Survey, “Where can I find my IIPC friends”, was intended to foster collaboration between IIPC members, based on information related to their web archiving activities, staff and techniques. The results were presented at and used as input into the General Assembly in Wellington and in Zagreb. The survey was designed by Barbara Sierman, KB, and Birgit Nordsmark Henriksen, the Royal Danish Library, with inputs from the IIPC Steering Committee members and PCO.
TRAINING VIDEO CASE STUDIES 2019-2020
Project coordinators: Abbie Grotke, Library of Congress, Maria Praetzellis, Internet Archive, Maria Ryan, National Library of Ireland, and Claire Newing, National Archives, UK Training Working Group Co-chairs
Funding: 3,869 USD
The video case studies were created to complement the training materials produced by the Training Working Group (TWG). They were filmed at the IIPC Web Archiving Conference in Zagreb in June 2019.
TRAINING CURRICULUM DEVELOPMENT 2017-2020
Project coordinators: Tom Cramer, Stanford University Libraries, Abbie Grotke, Library of Congress and Maria Praetzellis, Internet Archive, Training Working Group Co-chairs
Project partners: Sharon McMeekin and Sara Day Thomson, Digital Preservation Coalition
Funding: 6,898 USD
The Training Working Group created a web archiving training curriculum for beginners. This course contains eight sessions and comprises presentation slides and speakers’ notes. Each module starts with an introduction which outlines the learning objectives and target audience and includes information about the way the slides can be customised as well as a comprehensive list of related resources and tools. Published under a CC licence, the training materials can be fully customised and modified by the users. The beginner’s training materials were produced in partnership with the Digital Preservation Coalition (DPC).
2009 – 2017
PRESERVATION WORKING GROUP’S DATABASES
Project coordinators: Tobias Steinke, German National Library and Grace Thomas, Library of Congress, Preservation Working Group Co-chairs
The Preservation Working Group maintained a database of work packages on formats, software, web environmental scans and relevant bibliographies.
CROWDSOURCING WORKSHOP & USE CASES
The project aimed to investigate how crowdsourcing web archiving activities may begin to redress that balance and increase the amount of manpower available to throughout all stages of the web archiving workflow in member institutions.
DOMAIN CRAWL REPORT
The IIPC Harvesting Practices Survey was developed in order to understand, analyze and to collate the Internet archiving processes and experiences amongst IIPC members. The objective was to encourage and support memory institutions everywhere to address archiving and preservation of web resources by providing a benchmark and giving an overview of current web archiving practices.
EVALUATING TWITTERVANE
The primary goal of the project was to evaluate the Twittervane – a prototype application, which is capable of analyzing Twitter feeds and determining which websites are shared most frequently around a given theme over a given time period.
HOW TO FIT IN? INTEGRATE A WEB ARCHIVING PROGRAM IN YOUR ORGANIZATION
This IIPC-sponsored workshop was held at the Bibliothèque nationale de France (26-30 Nov. 2012). The aim was to investigate the challenges and methods involved in implementing web archiving in all mainstream activities of a heritage institution: general institution strategy, acquisition practices, IT operations, preservation, access.
JHONAS
The overall goal of the project was to enhance existing tools in order to ease the adaptation of WARC as the preferred archiving format for digital preservation. In order to accomplish this, two applications were chosen which would cover the entire digital preservation workflow. The two applications chosen were: JHove2 and NetarchiveSuite.
LIVE ARCHIVING HTTP PROXY
The Live Archiving Proxy (LAP) project was a collaboration between Ina and Netarkivet.dk to build an HTTP proxy that would able to capture the traffic that flows trough it, and delegate the handling of the captured data to a writer using a simple network protocol. The goal was to be able to write the captured traffic into any kind of archive format using any computer language.
MEMENTO
To goal of the project was to aggregate the metadata of the distributed archives of the IIPC, and to provide 1) Memento based access to the holdings of open archives, 2) knowledge of the holdings of restricted archives and 2) knowledge to IIPC members of the holdings of totally closed archives.
PHD SPONSORSHIP
The University of North Texas College of Information sponsored a 3-year award to support doctoral studies in its Interdisciplinary Information Science Ph.D. Program.
STAFF EXCHANGE
The purpose of the project was to gather expert advice, assistance and guidance in the processes of migration from Heritrix 1 to Heritrix 3 and setting up distributed crawls with Heritrix 3.
STATISTICS AND QUALITY INDICATORS FOR WEB ARCHIVING
In 2009, the ISO Technical Committee 46 (Information and Documentation) decided to set up a working group on “Statistics and Quality Indicators for Web Archiving”. The group has delivered a Draft Technical Report (PDF) in 2013.
TWITTERVANE
Prototype/Investigatory project by the British Library to use Twitter to build a web archive collection.
WARC TOOLS PROJECT
The main goal of the WARC Tools project was to facilitate and promote the adoption of the WARC file format for storing web archives by the mainstream web development community by providing an open source software library, a set of command line tools, web server plug-ins and technical documentation for manipulation and management of web archive files, or WARC files.
WAYBACK, HERITRIX AND NUTCHWAX DOCUMENTATION
2009 project led by the Internet Archive that documented NutchWAX, Heritrix, and Wayback.
WEB ARCHIVE PROFILING VIA SAMPLING
Research project looking at how archives respond to queries for archived content and over time build up a profile of the top-level domains (TLDs), Uniform Resource Identifiers (URIs), content language, and temporal spread of the archive’s holdings.