SARA AUBRY

Bibliothèque nationale de France (BnF)

The WARC file format: update and exchange on latest works

The WARC (Web ARChive) file format was defined to support the web archiving community in harvesting web resources, accessing web archives in a variety of ways and preserving large numbers of born-digital files on the long term. It was initially released as an ISO international standard in May 2009 and first and in a minor scope revised in August 2017. The next revision vote is currently scheduled for 2022 with publication for 2025.

This discussion aims at gathering IIPC members who expressed interest in introducing changes and evolutions to WARC during “The WARC file format: preparing next steps” workshop during the IIPC GA and WAC in Wellington, November 2018.

The objective is to exchange on the first use cases, tests and pratical implemantations on the first identified topics (related resources, possible extensions for HTTP2, identify provenance headers, keep track of dynamic history, clarify warcfile name and compression) and beyond if needed.

Exchanges on IIPC Github (http://iipc.github.io/warc-specifications/) and Slack (https://iipc.slack.com) channels will be used to prepare and structure the discussion before the face-to-face meeting.

Workshop: WARC format