National web archives

14:30 – 14:50

Ivy Lee Huey Shin: Sharing by the National Library Singapore on the journey towards collecting digital materials

National Library Board Singapore

The National Library, Singapore is a knowledge institution under the National Library Board (NLB). It has a mandate to preserve the published heritage of the nation through legal deposit and has been collecting works published in Singapore for the last 60 years. As the mandate was limited to physical items, NLB updated its legislation in 2018 to enable it to collect digital publications, including websites to keep up with technological changes. This would strengthen its national collection and create a lasting legacy for future generations of Singaporeans.

The legislative review included a major policy and process reassessment. NLB studied other National Libraries’ legislations, researched on copyright issues, and conducted extensive public consultation to address stakeholders’ concerns. In July 2018, the Bill to amend the NLB Act was passed by the Singapore Parliament, which empowered NLB to collect, preserve and provide access to Singapore websites and electronic publications, amongst other revisions. The changes are slated to take effect in early 2019.

As part of the preparation work for the legislative review, NLB invested in systems and infrastructure enhancement so that it would be able to better process and support the web archiving collection. The Web Archive Singapore (WAS) (eresources.nlb.gov.sg/webarchives) is a portal that hosts the NLB’s collection of archived Singapore websites. First launched in 2006, the original portal only had two functions – keyword search and subject browsing. In August 2018, WAS portal was revamped with a new interface that included five new functions – curation, full text search, public nomination of websites, data visualiser, and rights management. The supporting infrastructures were also enhanced by an in-house Task Management System to manage the selection, crawling and quality assessment of archived websites.

NLB adopts a multi-prong approach to web archiving the nation’s published works online. First, NLB will conduct domain archiving of the more than 180,000 registered .sg websites. Next, it will selectively archive non.sg websites via consent-seeking. These will be done in a systematic manner to ensure that the National Library does not miss out on websites of heritage and research value to Singapore. With the prevalence of social media, NLB has also been exploring and experimenting with the collection of social media content. More resources will be allocated to further explore social media archiving in the coming years.

This presentation will highlight the efforts that NLB took to update the legislation, revamp its web archive portal, and the planning that went into .sg domain crawl as well as other web archiving activities in the pipeline.

14:50 – 15:10

Friedel Geeraert & Sébastien Soyez: The first steps towards a Belgian web archive

Friedel Geeraert, State Archives and Royal Library of Belgium
Sébastien Soyez, State Archives of Belgium

This paper focuses on the research project PROMISE that aims to set up a long-term web archiving strategy for Belgium.[1]

The project was initiated by the State Archives and the Royal Library in 2017 and will run until December 2019. The goals of the project are to 1) identify (inter)national best practices in the field of web archiving, 2) define and develop a Belgian web archiving strategy and policy, 3) pilot the web archiving service and 4) make recommendations for a sustainable web archiving service in Belgium. The State Archives and the Royal Library partnered with the universities of Ghent and Namur and the university college Bruxelles-Brabant to form an interdisciplinary team encompassing information professionals and legal and technical experts.

Cooperation is at the heart of the PROMISE project. Even though the State Archives and the Royal Library both work within their own legal framework, the Law on archives and the Law on legal deposit respectively, they wish to create a Belgian web archive together and share technical infrastructure and know-how. They have worked on a shared strategy that is based on the Open Archival Information System (OAIS) reference model in order to cover the entire web archiving workflow.

With regards to selection and curation a double approach has been chosen: selective crawls on the one hand and broad crawls on the other. The Royal Library and the State Archives have each created their own seed lists for the selective crawls. The State Archives focused on the websites of public institutions while the Royal Library chose to select (parts of) websites based on certain themes related to its core functions and missions such as Belgian comics or e-magazines. A shared model for descriptive metadata is used by both institutions for these selective collections based on recommendations by the OCLC. This choice ensures interoperability of the metadata so that they can be integrated in a shared access platform without compromising the use of specific metadata models based on archival or library principles. The broad crawl on the other hand is managed by both institutions together and consists of taking a representative sample of the Belgian web. The definition of what can be considered as the ‘Belgian web’ is one of the cornerstones of this task.

Given that web archiving is a new activity for both institutions, interesting lessons can be drawn from these first experiences with regards to organisational approaches and (training in) selection and curation. Curating collections of websites required a significant change in mind-set for the cataloguers who worked on the seed list for the Royal Library for example.

In conclusion, this paper will provide insight into the organisation of the pilot of the ‘Belgian web archive’, the collaborative strategy, the selection and curation of the first web collections and the lessons learnt.

[1] PROMISE (Preserving online multiple information: towards a Belgian strategy) is a BRAIN project financed by the Belgian Science Policy Office.

15:10 – 15:30

Corey Davis, Carole Gagné & Nicholas Worby: True North: the current state of web archiving in Canada

Corey Davis, Council of Prairie and Pacific University Libraries (COPPUL) & the University of Victoria
Carole Gagné, Bibliothèque et Archives nationales du Québec
Nicholas Worby, University of Toronto

True North: the current state of web archiving in Canada

Under the auspices of the Canadian Association of Research Libraries (CARL), the Canadian Web Archiving Coalition (CWAC) is an inclusive community of practice within Canadian libraries, archives, and other memory institutions engaged or otherwise interested in web archiving. The Coalition’s mission is to identify gaps and opportunities that could be addressed by nationally coordinated strategies, actions, and services, including collaborative collection development, training, infrastructure development, and support for practitioners and researchers. In this session, members of the Coalition, including the Chair, will provide the international community with an update on national projects and initiatives underway in Canada, with a special focus in several key areas: the evolving collaborative collections development environment, the development of infrastructure for the repatriation and long-term preservation of web archives data in Canada, and the development of a Canadian copyright code of best practices for web archiving.

15.30 – 15:40

Q&A

National web archives