Workshops & tutorials -

There are a number of workshops and tutorials in the conference programme this year, including a full day post-conference workshop on Friday November 16. As spaces are limited for some of these sessions, please sign up during your registration process if you wish to attend.

Overview

Tue 15:30-17:00 WARCs: Archives Unleashed Toolkit
Wed 16:00-17:30 Webrecorder
Wed 16:00-17:30 Cobweb
Wed 15:30-16:30 Web Curator Tool
Wed 16:30-17:30 The WARC file format
Fri 09:00-13:00 Documenting the Now
Fri 09:00-12:00 Crowdsourcing requirements for discovery and access (NEW)

What can you do with WARCs?

Time: Tuesday, 16:00-17:30, Venue: Tiakiwai
Number of participants: 45
Workshop coordinators: Andrew N. Jackson, Ian Milligan and Olga Holownia

This workshop will introduce a range of tools for full-text indexing and analysis of web archived material. For full-text search and visualisation, this will be based on the webarchive-discovery indexing system, and the Shine and Warclight user interfaces that enable the exploration of the archived data.

For general analysis,we will look at the Archives Unleashed Toolkit and its front end, the Archives Unleashed Cloud. In this workshop, we will go through the following process on sample data (or a selection of attendees own WARCs if they bring a few):

- Discovering the frequency of domains within a collection;
- Extracting plain text of HTML pages from a web archive based on:
  - Particular domains (i.e. all pages from archive.org);
  - Date (i.e. all pages from 2009); and
  - Language (i.e. French or English-language pages as detected by Tika)
  - Extracting and visualizing a hyperlink network.

Human scale web collecting for individuals and institutions (Webrecorder tutorial)

Time: Wednesday, 15:30-17:30, Venue: Tiakiwai
Number of participants: 25
Tutorial Coordinator: Anna Perricci

This tutorial on Webrecorder will give participants a working knowledge of how to build, maintain and share web archives with Webrecorder.io and use Webrecorder Player, a desktop application for accessing WARC files, to interact with web archives offline. Participants will benefit from this tutorial by gaining the ability to create high fidelity captures to make collections that can be managed and shared within Webrecorder.io or downloaded and added to larger collections created using web crawlers. This tutorial on Webrecorder’s suite of tools and features will provide an important perspective on the current state and future of an emerging approach web collecting.

This tutorial will be a mix of demos and hands on activities accompanied by discussions. Materials can be delivered in units customized for audiences (e.g. those with experience with web archiving or participants who are new to web archiving).

The WARC file format: preparing next steps

Time: Wednesday, 16:30-17:30, Venue: Tiakiwai
Number of participants: Uncapped
Workshop coordinator: Sara Aubry

The WARC file format was initially released as an ISO international standard in May 2009 named 28500:2009 (also known as WARC 1.0). As with all ISO standards, the WARC standard is periodically reviewed to ensure that it continues to meet the changing needs that emerge from practice. The first revision, supported by an IIPC task force and the subcommittee in charge of technical interoperability within ISO information and documentation technical committee (ISO/TC46/SC4), was published in August 2017 as ISO 28500:2017 (also known as WARC 1.1). The next regular ISO vote to start another revision process is currently scheduled for 2020.

This discussion aims at gathering IIPC members interested in and working with the WARC format to inventory needs for either small or larger evolutions, share them within the group to identify common interests and start shaping the scope of the upcoming revision. Exchanges on IIPC Github and Slack channels will be used to prepare and structure the discussion before the face-to-face meeting.

The Web Curator Tool relaunch

Time: Wednesday, 15:30-16:30, Venue: Tiakiwai
Number of participants: Uncapped
Workshop coordinators: Ben O’Brien and Hanna Koppelaar

This tutorial will highlight the new features of the Web Curator Tool (WCT), added from January 2018 onwards through collaboration between the National Library of New Zealand (NLNZ) and the Koninklijke Bibliotheek – National Library of the Netherlands (KB-NL). One of the themes from the collaboration has been to future proof the WCT. This involves learning the lessons from the previous development and recognising the advancements and trends occurring in the web archiving community. The objective is to get the WCT to a platform where it can keep pace with the requirements of archiving the modern web. The first step in that process was decoupling the integration with the old Heritrix 1.x web crawler, and allowing the WCT to harvest using the more modern Heritrix 3.x version. A proof of concept for this change was successfully developed and deployed by the NLNZ, and has been the basis for a joint development work plan. While it will primarily be a demonstration, the tutorial is intended to be an interactive session with the audience and a showcase of how to work collaboratively on opposite sides of the world.

Using Cobweb to manage collaborative or complementary web archive collecting projects

Time: Wednesday, 16:00-17:30, Venue: Piptea Street 1:14/1:15 (You will be guided to these rooms)
Number of participants: 20
Workshop coordinators: Kathryn Stine, Stephen Abrams and Peter Broadwell

Cobweb supports three key functions of collaborative collection development: suggesting nominations, asserting claims, and reporting holdings. Curators establish thematic collecting projects in Cobweb and encourage nominators to suggest relevant seed web sites as candidates for archiving. For any given collecting project, archival programs can claim their intention to capture a subset of nominated seeds. Once they have successfully captured seeds included in a given collecting project, descriptions of these holdings will become part of the Cobweb holdings registry. Cobweb interacts with external data sources to populate this registry, which curators can then search and browse to inform their planning for future collecting activity and which researchers can use to explore descriptions of archived web resources useful to their research.

Participants can expect orientations to setting up Cobweb accounts; establishing and updating collecting projects; determining and setting approaches for soliciting nominations to their projects; assigning descriptive metadata to projects, nominations, and holdings; understanding metadata flows into and out of Cobweb; and advanced searching within and across the Cobweb registry. Some time will also be spent on exploring how Cobweb supports multi-participant communication within and across the activities involved in establishing and managing collecting projects. The tutorial facilitator will provide overviews of Cobweb documentation, how Cobweb relates to or interacts with complementary web archiving systems and tools, and the roadmap for continued maintenance and enhancement of the Cobweb platform.

Ethical social media archiving through community collaboration

Time: Friday, 09:00-13:00, Venue: Tiakiwai
Number of participants: 30
Workshop coordinators: Jessica Moran, Matariki Williams, Bergis Jules, Edward Summers, Alexandra Dolan-Mescal and Francis Kayiwa

This workshop will bring together community activists, archivists, librarians, scholars, developers, and designers to discuss how we can create richer, non-oppressive web archives—archives that will better serve their publics and the historical record.

This workshop will address these issues that live at the intersection of archival practice and the expressions of community and culture on the web and social media.

The workshop will be organised in three parts:

- Introducing workshop participants to the Documenting the Now project.

- Introduce Documenting the Now and other social media web archiving tools.

- Invited speakers from New Zealand will discuss their experience in online spaces, their current archival or collecting practices, and their aspirations for the future.

Structured group conversation around what ethical and collaborative community led social media archiving might look like.

What do researchers want? Crowdsourcing requirements for discovery and access

Time: Friday, 09:00-12:00, Venue: National Library, Room 3:19/21 (You will be guided to these rooms)
Number of participants: Uncapped
Workshop coordinators: Andrea Goethals and Ben O’Brien

How can we make our Web archive collections as usable as possible to researchers, students, teachers, the general public? What do we need to provide in terms of technical infrastructure, features, metadata, or support? What are the requirements for different user groups? What else do we need to provide so that there is enough context around our Web archives to make them understandable and fit for use in scholarship? How can we promote creative uses of our Web archives?

If you are interested in discussing these questions please join us. This is an opportunity to share ideas, lessons learned so far (successes and failures!), and to demo any platforms you are willing to share. Our goal is that this interactive session will result in collectively identifying and documenting the features, key considerations, potential challenges and high-level requirements for a platform that enables researchers and other users to make full use of our Web archives.

As this workshop is a late addition, please RSVP directly to Andrea.Goethals@dia.govt.nz. Indicate in your RSVP if you would like to provide a brief demo of your platform.