IIPC RSS webinar: Web archiving social media and news websites
IIPC Research Speaker Series (RSS) focuses on the research use of web archives and features presentations of use cases, collaborative projects and new tools for researchers. This webinar will introduce four recent projects which focus on different aspects of capturing social media and frequent updates on news websites, particularly in the context of rapid response and event-based collections.
Session two (5pm ET / 2pm PT / Thu 9am Wellington / 7am Canberra)
Agenda:
– Introduction
– The BESOCIAL Team: BESOCIAL – towards a sustainable social media archiving strategy for Belgium
– Gillian Lee & Ben O’Brien: Archiving Twitter in New Zealand
– Ilya Kreymer: Automated scripts to capture social media – a collaborative community effort?
– Q&A
BESOCIAL – towards a sustainable social media archiving strategy for Belgium
KBR, Royal Library of Belgium launched the BESOCIAL project in Summer 2020. The aim of this two-year project (2020-2022) is to set up a sustainable strategy for archiving and preserving social media in Belgium. Initially, the objective is to archive social media content related to certain Belgian events selected during the project, as well as the social media content related to the KBR’s newspaper collections. Furthermore, the project will explore how to open up the social media archive for use. These collections will complement the collections of websites archived during the PROMISE project (2017-2019). The BESOCIAL project team is led by KBR and includes: CENTAL (Centre for Natural Language Processing) at Université Catholique de Louvain, CRIDS (Research Centre in Information, Law and Society) at Namur University, GhentCDH (Ghent Centre for Digital Humanities), IDLab (Internet Technology & Data Science Lab) and MICT (Research Group for Media, Innovation and Communication Technologies) at Ghent University.
Archiving Twitter in New Zealand – Gillian Lee & Ben O’Brien
The National Library of New Zealand has been collecting tweets via the public Twitter API since late 2016. Our collecting has centered around NZ based events. On March 15th 2019, we started rapidly collecting tweets in response to the Christchurch Mosque attacks. This presentation will cover our rapid response collecting, workflows for post-processing Twitter data, and working with subsequent researcher requests.
Automated scripts to capture social media – Ilya Kreymer
Archiving social media through the web browser can be difficult, especially due to the quick changing nature of social media sites. Webrecorder has developed a set of behavior scripts that run in a browser on a single page and automate interaction with the site. (These scripts form the core of the Autopilot in Conifer and Webrecorder). The scripts can be small, some <300 lines of Javascript, and can be used with any browser-based crawler (or just pasted into a browser). However, the scripts can break at any time and require maintenance to keep running. The goal is to share what Webrecorder is doing now and to discuss a community approach to maintaining these scripts, and adding new ones, to ensure the availability of high-fidelity social media capture for all.