Archives Unleashed 4.0: Web Archive Datathon

British Library, 11 – 13 June, 2017

The fourth in the Archives Unleashed workshop series is an opportunity to collaboratively unleash our web collections, exploring cutting-edge research tools while fostering a broad-based consensus on future directions in web archive analysis.

The event is planned as a premier opportunity for the development of multi-disciplinary international collaborations pertaining to research with large-scale Web archive data.

This event, is sponsored by the British Library, Rutgers University, University of Waterloo, the National Science Foundation and the International Internet Preservation Consortium (IIPC).

Organizers:

  • Matthew Weber, Communication, Rutgers University
  • Ian Milligan, History, University of Waterloo
  • Jimmy Lin, Computer Science, University of Waterloo
  • Olga Holownia, British Library / IIPC

Schedule:

Sunday, June 11

17:00 – 19:00: Welcome Reception (The Parcel Yard, London King’s Cross)

Monday, June 12

09:00 – 09:15: Registration and Arrival (British Library, Knowledge Centre)
09:15 – 09:30: Opening comments, stating the problem, logistical comments (Matthew Weber, Ian Milligan, and Jason Webber)
09:30 – 10:30: Current Efforts:

  • Warcbase (Ian Milligan)
  • Internet Archive APIs (Jefferson Bailey, Internet Archive)
  • National Archives Datasets (Tom Storrar, The National Archives)
  • UK Web Archive (Andy Jackson, The British Library)

10:30 – 12:30: Needs identification revisited; group formation and starting to work
12:30 – 14:00: Lunch
13:00 – 15:30: Datathon!
15:30 – 16:00: Coffee Break and Lightning Talks
16:00 – 17:00: Datathon!
19:00 – End: Dinner (Albertini)

Tuesday, June 13

09:00 – 10:30: Datathon!
10:30 – 11:00: Coffee Break
11:00 – 12:30: Datathon!
12:30 – 14:00: Lunch
14:00 – 15:30: Datathon!
15:30 – 16:00: Coffee Break and lightning talks
16:00 – 17:00: Closing Session and Final Presentations

Wednesday, June 14 (optional for attendees)

16:30 – 17:00: Sharing of Datathon Results at the RESAW Conference

PROJECTS

  • winner Link Ranking Team (relative vs hard links): Gregory Wiedeman, Kees Teszelszky, Mindaugas Vidmantas, Peter Webster & Richard Deswarte
  • runner-up: Team Robots (impact of robots.txt exclusion): Emily Maemura, Graham Seaman, Toke Eskildsen & Yves Maurer
  • runner-up: Team Intersect (overlap across Occupy Wall Street data): Dawn Walk, Gil Hoggarth, Jessica Ogden, Mat Kelly, Sawood Alam & Shawn Walker