Daniel Gomes: Arquivo.pt – memorial and other goodies

Arquivo.pt – Fundação para a Ciência e a Tecnologia

Arquivo.pt is a research infrastructure that preserves information gathered from the web since 1996 and provides public online services to explore this historical data. Arquivo.pt contains over 4 billion pages collected from 14 million websites in several languages and provides user interfaces in English. In 2018, Arquivo.pt received over 170 000 users and 78% of them were originated outside of Portugal. Despite focusing on the preservation of the Portuguese web, Arquivo.pt has the inherent mission of serving the scientific community, thus it also preserves selected international websites. The search and access services over the archived data are stable since 2016. This presentation will highlight the main innovations performed during the past 3 years to develop new services and expand the user community.

Efforts were focused on developing added-value features that would extend the utility of Arquivo.pt to new usage scenarios such as robustify.arquivo.pt, that was based on René Voorburg robustify.js project, or the Arquivo.pt Memorial.

Robustify.arquivo.pt is a mechanism that automatically fixes broken links in web pages. When a broken link is detected, it provides web users the option to access a previous version of the linked web page that was preserved by a web archive. Webmasters just need to add one single line of code to benefit from this feature on their websites and stop worrying about fixing the numerous broken links that arise among their older pages.

There are websites that are no longer updated with new content but have to be kept online because they provide important information, such as websites that document finished projects. However, the cost of maintaining these stale websites increases over time due to the obsolescence of the technologies that support them and that very often causes dangerous security vulnerabilities. The Arquivo.pt Memorial offers high-quality preservation of websites’ content with the possibility of maintaining their original domains. For example, UMIC – Knowledge Society Agency was a public institute that existed from January 2005 to February 2012 and its website was deactivated in 2017. However, its official domain www.english.umic.pt remains active but references a version preserved in Arquivo.pt.

The strategy adopted to extend the user community focused on stimulating training among power users so that they become disseminators of Arquivo.pt among their own communities. A training program about web preservation was put in place (arquivo.pt/training) with the objective of raising awareness to the importance of preserving online digital heritage. In 2018, we launched our first training activity for an international audience with a tutorial named “Research the Past Web using Web archives” as part of the TPDL conference. At the same time, other dissemination activities were performed such as the production of training videos, dynamization of social network channels by posting links to web-archived pages related to a calendar of national and international celebrations, creation of collaborative collections (e.g. national elections) and a public exhibition of posters that highlight historical web pages.

Maria PraetzellisSharon McMeekin & Abbie Grotke: Building the IIPC training program

Maria Praetzellis, Internet Archive
Sharon McMeekin, Digital Preservation Consortium
Abbie Grotke, Library of Congress

This presentation will showcase the outcomes of IIPC’s Training Working Group, which is building a high quality web archiving curriculum for IIPC members, web archivists, and technologists engaged in preserving web materials. Practitioners have varying approaches to archiving reflecting different institutional mandates, legal contexts, technical infrastructure, etc, but share a need for expert training models and instructional methods. The TWG has been funded by the IIPC to fill this need by creating a series of openly accessible educational materials and training. This foundational work aims also establish a framework for the creation of focused, topical training and educational resources going forward.

Together with the Digital Preservation Coalition, the TWG has produced the first set of training materials designed for the beginning practitioner covering technical, curatorial and policy related aspects of web archiving. Beyond the core purpose of providing this level of baseline training, the training materials can also be used by IIPC members and the larger digital preservation community for marketing and outreach, internal and external advocacy, and ongoing program and professional development.

This first delivery of training materials will be released to coincide with the IIPC WAC 2019 and includes 13 modules on topics ranging from web archives as primary sources to building a business case for ongoing program funding. During the presentation program chairs from the IIPC and DPC will share the educational materials produced as part of this project, including slide decks, online videos, and teaching and workshop plans. The session will seek feedback from attendees on future areas of curriculum development as well as further ideas for additional instructional approaches. As the TWG works towards developing intermediate and advanced level training materials, presenting at WAC 2019 will provide the opportunity for greater community involvement in the TWG’s work as the program advances.

Julie Fukuyama & Simon Tanner: Developing impact assessment indicators – making a proposal for the UK Web Archive

Julie Fukuyama, National Diet Library
Simon Tanner, King’s College London

This paper presents the results of a study to examine, determine and propose the optimal approach to develop impact assessment indicators for the UK Web Archive (UKWA). In the United Kingdom, legal deposit libraries collaboratively operate a nationwide web archiving project, the UKWA, which has collected over 500 TB of data and is growing by approximately 60–70 TB a year. At the same time, UK publicly funded organisations face reduced funding and the challenge of convincing funders to finance their archival function by undergoing evaluations of their services’ values.

Under such circumstances, a proper assessment of the values and impacts of web archiving is a point of discussion for cultural heritage organisations. To the best of the authors’ knowledge, there has not yet been a comprehensive assessment or evaluation of the UKWA conducted. Thus, this paper seeks to answer the research question: “What would the indicators of impact assessment for the UKWA be?” As a result, we propose a set of impact assessment indicators for the UKWA (and web archiving in general) with broad strategic perspectives including social, cultural, educational and economic impact.

This study examines and proposes the optimal approach to develop impact assessment indicators for the UKWA. The research began by analysing the literature of impact assessment frameworks for digital resources and the types of impact in related fields. Primarily drawing from Simon Tanner’s Balanced Value Impact Model (BVI Model), this research then proposes impact indicators for the UKWA and develops an impact assessment plan consisting of three stages: context setting, indicator development, and indicator evaluation.

This paper will present the method and results of the study. Firstly, it identified the UKWA’s foundational context, the mission, the principal values and the key stakeholder groups. The research project prioritised focal areas for the archive that seem most advantageous for stakeholders and aligned with Tanner’s Value Lenses. Secondly, we proposed the UKWA impact assessment indicators; scrutinising existing indicators and various evidence collection methods. In the third stage, the developed indicators’ functionality was checked against set quality criteria and then tested through semi-structured interviews and survey submissions with 8 UKWA staff members.

Finally, the paper presents the thirteen potential indicators for the UKWA. Based on the lessons learned, presenters will also make recommendations for organisations which recognise the necessity of undertaking impact assessments of their web archives.

Ricardo Basilio & Daniel Bicho: Librarians as web curators in universities

Arquivo.pt – Fundação para a Ciência e a Tecnologia

Arquivo.pt aims to expand its community by widening the purposes for which web archives can be useful, such as research on Humanities or preservation of institutional memory about Portuguese universities. However, there is not yet a group of practitioners on web curation or a team of researchers familiar on dealing with preserved web-content.

This presentation argues that librarians in universities have an important role in supporting the exploration of preserved web-content. They can contribute as local experts on web preservation. By taking on the curation of institutional websites, librarians will acquire skills and knowledge about web archiving and will help researchers to use web archived materials. To achieve that, a training for librarians has been prepared in three parts. This project aims to reach researchers in order to get their real requirements, so that Arquivo.pt and librarians as web curators can properly respond.

The first part is an introduction about the fundamentals of web archiving, namely, technologies, projects and terminology required to contextualize and integrate newcomers to this area.

The second part takes the trainees to make an experiment on how web preservation can be performed at a very small scale. Web preservation is presented as a three-step sequence: capture, store and replay. During this training session, webrecorder.io is used for capturing a set of institutional websites, social pages, embedded video and audio relevant to the trainees. The resulting WARC files are stored in a local folder. Finally, Webrecorder Player replays WARC files offline in a local environment.

The third part of the training provides librarians with a set of practical suggestions for reusing web-archived content in their own institutions (e.g. lists, online exhibitions following the case of use www.memoriafcsh.wordpress.com, posts on social media).

As a result, librarians participating in this training are expected to be able to acquire basic terminology and working knowledge of the technologies involved, carry out small projects of capture, store and replay of WARC files and to exhibit and share preserved web content that documents the memory of institutional websites, regardless of the web archive where they are stored (e.g. Arquivo.pt, Internet Archive, Webrecorder collections).

BAD is the Portuguese association of information science professionals with 175 registered libraries. As curator of Arquivo.pt and also librarian, the author of this presentation has proposed the inclusion of these topics in the training program for librarians of BAD and has been accepted.

Since October 2018, webinars have taken part, presentations in the World Digital Preservation Day (WDPD2018) and other workshops are scheduled till the end of the scholar period. Participants has been mostly librarians.

Training & impact