There are a number of workshops and tutorials in the conference programme this year, including a full day post-conference workshop on Friday November 16. As spaces are limited for some of these sessions, please sign up during your registration process if you wish to attend.
Tue 15:30-17:00 WARCs: Archives Unleashed Toolkit
Wed 16:00-17:30 Webrecorder
Wed 16:00-17:30 Cobweb
Wed 15:30-16:30 Web Curator Tool
Wed 16:30-17:30 The WARC file format
Fri 09:00-15:00 Documenting the Now
|What can you do with WARCs?
Time: Tuesday, 16:00-17:30, Venue: Tiakiwai
Number of participants: 20
Workshop coordinators: Andrew N. Jackson, Sara Aubry, Ian Milligan and Olga Holownia
This workshop will introduce a range of tools for full-text indexing and analysis of web archived material. For full-text search and visualisation, this will be based on the webarchive-discovery indexing system, and the Shine and Warclight user interfaces that enable the exploration of the archived data.
For general analysis,we will look at the Archives Unleashed Toolkit and its front end, the Archives Unleashed Cloud. In this workshop, we will go through the following process on sample data (or a selection of attendees own WARCs if they bring a few):
- Discovering the frequency of domains within a collection;
- Extracting plain text of HTML pages from a web archive based on:
- Particular domains (i.e. all pages from archive.org);
- Date (i.e. all pages from 2009); and
- Language (i.e. French or English-language pages as detected by Tika)
- Extracting and visualizing a hyperlink network.
|Human scale web collecting for individuals and institutions (Webrecorder tutorial)
Time: Wednesday, 15:30-17:30, Venue: Tiakiwai
Number of participants: 25
Tutorial Coordinator: Anna Perricci
This tutorial on Webrecorder will give participants a working knowledge of how to build, maintain and share web archives with Webrecorder.io and use Webrecorder Player, a desktop application for accessing WARC files, to interact with web archives offline. Participants will benefit from this tutorial by gaining the ability to create high fidelity captures to make collections that can be managed and shared within Webrecorder.io or downloaded and added to larger collections created using web crawlers. This tutorial on Webrecorder’s suite of tools and features will provide an important perspective on the current state and future of an emerging approach web collecting.
This tutorial will be a mix of demos and hands on activities accompanied by discussions. Materials can be delivered in units customized for audiences (e.g. those with experience with web archiving or participants who are new to web archiving).
|The WARC file format: preparing next steps
Time: Wednesday, 16:30-17:30, Venue: Tiakiwai
Number of participants: Uncapped
Workshop coordinator: Sara Aubry
The WARC file format was initially released as an ISO international standard in May 2009 named 28500:2009 (also known as WARC 1.0). As with all ISO standards, the WARC standard is periodically reviewed to ensure that it continues to meet the changing needs that emerge from practice. The first revision, supported by an IIPC task force and the subcommittee in charge of technical interoperability within ISO information and documentation technical committee (ISO/TC46/SC4), was published in August 2017 as ISO 28500:2017 (also known as WARC 1.1). The next regular ISO vote to start another revision process is currently scheduled for 2020.
This discussion aims at gathering IIPC members interested in and working with the WARC format to inventory needs for either small or larger evolutions, share them within the group to identify common interests and start shaping the scope of the upcoming revision. Exchanges on IIPC Github and Slack channels will be used to prepare and structure the discussion before the face-to-face meeting.
|The Web Curator Tool relaunch|
Time: Wednesday, 15:30-16:30, Venue: Tiakiwai
Number of participants: Uncapped
Workshop coordinators: Ben O’Brien and Jeffrey van der Hoeven
This tutorial will highlight the new features of the Web Curator Tool (WCT), added from January 2018 onwards through collaboration between the National Library of New Zealand (NLNZ) and the Koninklijke Bibliotheek – National Library of the Netherlands (KB-NL). One of the themes from the collaboration has been to future proof the WCT. This involves learning the lessons from the previous development and recognising the advancements and trends occurring in the web archiving community. The objective is to get the WCT to a platform where it can keep pace with the requirements of archiving the modern web. The first step in that process was decoupling the integration with the old Heritrix 1.x web crawler, and allowing the WCT to harvest using the more modern Heritrix 3.x version. A proof of concept for this change was successfully developed and deployed by the NLNZ, and has been the basis for a joint development work plan. While it will primarily be a demonstration, the tutorial is intended to be an interactive session with the audience and a showcase of how to work collaboratively on opposite sides of the world.
|Usnig Cobweb to manage collaborative or complementary web archive collecting projects|
Time: Wednesday, 16:00-17:30, Venue: Tiakiwai
Number of participants: 20
Workshop coordinators: Kathryn Stine, Stephen Abrams and Peter Broadwell
Cobweb supports three key functions of collaborative collection development: suggesting nominations, asserting claims, and reporting holdings. Curators establish thematic collecting projects in Cobweb and encourage nominators to suggest relevant seed web sites as candidates for archiving. For any given collecting project, archival programs can claim their intention to capture a subset of nominated seeds. Once they have successfully captured seeds included in a given collecting project, descriptions of these holdings will become part of the Cobweb holdings registry. Cobweb interacts with external data sources to populate this registry, which curators can then search and browse to inform their planning for future collecting activity and which researchers can use to explore descriptions of archived web resources useful to their research.
Participants can expect orientations to setting up Cobweb accounts; establishing and updating collecting projects; determining and setting approaches for soliciting nominations to their projects; assigning descriptive metadata to projects, nominations, and holdings; understanding metadata flows into and out of Cobweb; and advanced searching within and across the Cobweb registry. Some time will also be spent on exploring how Cobweb supports multi-participant communication within and across the activities involved in establishing and managing collecting projects. The tutorial facilitator will provide overviews of Cobweb documentation, how Cobweb relates to or interacts with complementary web archiving systems and tools, and the roadmap for continued maintenance and enhancement of the Cobweb platform.
|Ethical social media archiving through community collaboration|
Time: Friday, 09:00-16:00, Venue: Auditorium
Number of participants: 30
Workshop coordinators: Jessica Moran, Matariki Williams, Bergis Jules, Edward Summers, Alexandra Dolan-Mescal and Francis Kayiwa
This workshop will bring together community activists, archivists, librarians, scholars, developers, and designers to discuss how we can create richer, non-oppressive web archives—archives that will better serve their publics and the historical record.
This workshop will address these issues that live at the intersection of archival practice and the expressions of community and culture on the web and social media.
The workshop will be organised in three parts:
- Introducing workshop participants to the Documenting the Now project.
- Introduce Documenting the Now and other social media web archiving tools.
- Invited speakers from New Zealand will discuss their experience in online spaces, their current archival or collecting practices, and their aspirations for the future.
- Structured group conversation around what ethical and collaborative community led social media archiving might look like.