European Union & GDPR

16:10 – 16:30

Alejandra Michel: The legal framework for web archiving: focussing on GDPR and copyright exceptions

University of Namur

Web archiving is intimately linked to the freedom of expression protected by Article 10 of the European Convention of Human Rights. It also protects the right to information as such. This right to information is composed of two facets: on the one hand, an active component allowing the public to search for information and, on the other hand, a passive component allowing everyone to receive information. This explains the link between the “right to information” and the existence of web archiving initiatives. Indeed, these initiatives help to guarantee this right by facilitating research and access to information for the general public and society at large. In the Times Newspaper case, the European Court of Human Rights has already stated that web archives are protected by Article 10 of the Convention.

The importance of web archiving may be widely recognised. However, web archives raise many legal issues. These issues include among other the respective missions and responsibilities of national cultural heritage institutions in charge of web archiving, the delimitation of the national scope of jurisdiction for web archiving activities, copyright rules, sui generis right related to databases, data protection law, probative value of web archives and illegal contents.

In this presentation, after having discussed the right to information and explored the legal issues raised by web archiving, we will focus on two specific issues. Firstly, the GDPR provides a specific regime for archiving in the public interest and for historical and scientific research or statistical purposes. Indeed, when personal data are processed in specific contexts, the GDPR gives Member states the possibility to put in place a softened regime in terms of principles to follow, obligations to be respected and rights to be implemented. Secondly, the relevant copyright exceptions for a web archiving context considered in the Proposal Directive on copyright in the Digital Single Market. The analysis of the copyright legal framework will take the form of a mapping of all relevant considerations to establish a policy of selection and access to web archives.

16:30 – 16:50

Tom Storrar & Chris Doyle: Creating an archive of European Union law for Brexit

Tom Storrar, The National Archives (UK)
Chris Doyle, MirrorWeb Ltd

The UK is due to exit the EU on 29 March 2019. The Web Archiving team at The National Archives (UK) was given the job of producing a new publicly available, comprehensive archive of European law for Exit Day. This project was a vital part of the UK government’s plans for Brexit. Leaving the European Union is a fundamental constitutional and legal change that effects millions of people and businesses. The European Union (Withdrawal) Act 2018 makes The National Archives responsible for publishing European legislation and other relevant documents that will continue to be the law in UK after the exit the European Union.

Creating a comprehensive archive of European law for Brexit involved harvesting the relevant parts of the EUR-Lex website (https://eurlex.europa.eu/), one of the largest and most complicated multilingual websites available online. This archive was created in partnership with MirrorWeb and involved deploying both existing technologies in a new way as well as developing some entirely new technologies and techniques.

We had a number of motivations for creating a web archive of this content, not least in order to demonstrate exactly what the law was when, in its original form, along with other content such as the extensive body European case law, which provides important context for the collection.

The challenge called for a highly focused project and innovative approaches to capturing, verifying and replaying the content. Over the course of 15 months we performed 2 complete data-driven crawls of the target content, archiving over 20 million resources. Between January 2019 and Exit Day we have continue to capture all newly published and modified content so that the archive will reflect EUR-Lex as it stands on Exit Day.

Each and every archived resource was identified using various data sources before being captured and quality assured through multiple checks. We developed new approaches to quality assurance and we had to be sure the collection was as complete within our chosen scoping, as possible. Finally the archive was indexed for public access through a customised replay platform and sophisticated full text search service.

This presentation will detail the purpose of the collection, the challenges encountered, describe the archiving strategies we employed to build it, our approach to quality assurance and how we developed the public-facing service from an initial “alpha” to a mature collection within our web archives. We will also describe our approach to preserving the web archive content, alongside accompanying files. Finally, we will reflect on whether similar approaches can be employed in order to successfully and confidently archive other large, complicated, multilingual websites.

16:50 – 17:10

Els Breedstraet: Setting up an EU web preservation service for the long-term – tales of a (sometimes) bumpy road

Publications Office of the European Union

The EU web archive contains the main websites of the EU institutions, which are hosted on the europa.eu domain and subdomains. Its aim is to preserve EU web content in the long term and to keep it accessible for the public.

In 2016, IIPC gave us the opportunity to make a short and general presentation about our archive. The presentation was received enthusiastically by the audience who were interested to hear more about our activities in this field. Since then, we have come a long way to providing a more mature web preservation service for the EU institutions. So we feel that now is a good moment to share, during a 30-minute presentation, the lessons learned on our journey from a pilot project to a fully-fledged, durable, long-term service.

The presentation will address the following topics:

Introduction on the EU web archive and its history.
Description of how the EU web preservation service looks today: what we do, why we do it, how we do it and for who.
Lessons learned, plans and challenges ahead.

These will be presented in a practical way, in order to give other practitioners in the audience ideas for tools to use “at home”.

By telling our story, we hope to provide other participants with useful tips and tricks. At the end of the presentation, the public will be invited to share questions, thoughts, suggestions and/or similar experiences. This way, we hope to learn in return also from their know-how.

As the aim of the presentation is tell the tale of our way to a mature web preservation service, it suits well within the general theme of the conference (Maturing Practice Together).

17:10 – 17:30

Marinos Papadopoulos, Charalampos Bratsas, Michalis Gerolimos, Konstantinos Vavoussis, Eliza Makridou & Dimitra Chioti: Text and data mining for the national library of Greece in consideration of GDPR

Marinos Papadopoulos, Attorney-at-Law
Charalampos Bratsas, Open Knowledge Foundation
Eliza Makridou, Michalis Gerolimos & Dimitra Chioti, National Library of Greece
Konstantinos Vavoussis, TRUST-IT Ltd.

Text and Data Mining (TDM) as a technological option is usually leveraged upon by large libraries worldwide in the technologically enhanced processes of web-harvesting and web-archiving with the aim to collect, download, archive, and preserve content and works that are found available on the Internet. TDM is used to index, analyze, evaluate and interpret mass quantities of works including texts, sounds, images or data through an automated “tracking and pulling” process of online material. Access to the web content and works available online are subject to restrictions by legislation, especially to laws pertaining to Copyright, Industrial Property Rights and Data Privacy. As far as Data Privacy is concerned, the application of the General Data Protection Regulation (GDPR) is considered as an issue of vital importance, which among other requirements mandates the adoption of privacy-by-design and advanced security techniques. In the described framework, this paper focuses on the TDM design considerations and applied solutions employed by National Library of Greece (NLG). NLG has deployed TDM as of February 2017 in consideration of the provision of art.4(4)(b) of Law 4452/2017, as well as of the provisions of Regulation 2016/679/EU (GDPR). The Art.4(4)(b) of law 4452/2017 sets the TDM activity in Greece under the responsibility of NLG, appointed as the organization to undertake, allocate and coordinate the action of archiving the Hellenic web, i.e. as the organization responsible for text and data analysis at national level in Greece. The deployment of TDM by NLG, presented in this paper, caters for a framework of technical and legal considerations, so that the electronic service enabled based on the TDM operation complies with the data protection requirements set by the new EU legislation framework. The paper further elaborates upon said suitable set of technical and legal aspects considered by NLG for achieving GDPR compliance. The study falls under “Compliance with General Data Protection Regulation” thematic area in the framework of 2019 IIPC WAC participation.

17:30 – 17:40

Q&A

European Union & GDPR