11:00 – 11:20

Sumitra Duncan & Lori Donovan: Advancing art libraries: developing a collaborative national network for curated web archives

In mid-2018 the Internet Archive and the New York Art Resources Consortium (NYARC)—which consists of the Frick Art Reference Library of The Frick Collection and the research libraries of the Brooklyn Museum and The Museum of Modern Art—received a one-year National Forum grant from the Institute of Museum and Library Services (IMLS) in the Curating Collections project category: Advancing Art Libraries and Curated Web Archives: A National Forum. As part of this project, a National Forum and workshop will convene in February 2019, at the San Francisco Museum of Modern Art (SFMOMA), with librarians, archivists, and curators attending from diverse organizations, many of which are active members of the Art Libraries Society of North America (ARLIS/NA).

This project began with an initial round of outreach, research, and reporting that identified and summarized the challenges, opportunities, and potential areas for collaboration within the North American art and museum library community. Convening at the National Forum will allow this group of approximately 50 art librarians and archivists to coordinate current collection development practices, assess resource and program needs, and map out a national network for future collaborations and service models.

The Advancing Art Libraries and Curated Web Archives project naturally builds upon the Internet Archive’s more than 20 years of experience in web archiving and community building around digital stewardship, as well as NYARC’s successful program of art-specific web archiving to leverage joint expertise with a plan of action to catalyze the art and museum library community and create a roadmap for a sustainable national program of art-specific web archives.

A coordinated effort on program development at a networked level will ensure that at-risk born-digital art documentation and information will be collected, preserved, and remain accessible as a vital resource for current and future research. It is a central objective for NYARC and the Internet Archives to disseminate not only the research and resulting publications from this project, but to share the determined roadmap and collaborative model beyond the North American art library community and with those involved in web archiving efforts via the International Internet Preservation Consortium (IIPC). In this presentation, members of the project team, Sumitra Duncan, Head of the Web Archiving Program for NYARC, and Lori Donovan, Senior Program Manager, Web Archiving at the Internet Archive, will share key takeaways resulting from this initiative.

11:20 – 11:40

Sabine Schostag: Why, what, when and how: curatorial decisions and the importance of their documentation

The Royal Danish Library is by the legal deposit law obliged to collect and preserve the Danish Web sphere. You will never be able to archive the entire web or one nation’s entire web sphere. You have to make choices – and particularly – to keep trace on your decisions. This is not only of prime importance for the curators work, but is definitely also to the benefit of the users/researchers.

To master the task of archiving the Danish web sphere, the web curator team laid down a strategy: up to for broad crawls (snapshots) a year and a number of ongoing selective crawls. Our in-house developed curator tool, NetarchiveSuite, dos not offer enough functionalities and space for the documentation of all our decisions, in particular not for the selective crawls.

Thus, we had to decide on a documentation tool. We wanted a tool, which

  • Was easy to access for all involved persons
  • Was easy to edit, make changes, ad content
  • Offered the options to document
    • selections and deselections with reasons for decisions
    • start and end of crawl periods,
    • QA observations and follow-ups

We build an internal folder system within the Windows pathfinder. The folders represented the different steps of the workflow for selective crawls predetermined by the curators: identification of a domain to be crawled selectively, initial examination, analysis, data entry, quality assurance, monitoring and follow-up. We created a Word template and filled in a template for each selective crawled domain. Then we moved the documents around in the folder system according to their stage in the workflow. However, opening, editing and moving the documents between the folders according to the workflow required us to watch our step and soon it became rather difficult to handle. We started with moving the content of all domain documents to wiki-pages in the MediaWiki (https://www.mediawiki.org/) and ended up by migrating all our documentation to the Atlassian products, the jira https://www.atlassian.com/software/jira and the confluence wiki (https://www.atlassian.com/software/confluence). An important factor for this choice was the access management: We can assign individual access for every single page or deny it for pages with private content.

We converted the workflow for selectively crawled domains into a modified jira space (issue tracker). The status of the issue represents the step within the workflow. We transformed each selectively crawled domain into an issue.

In this way, we now have a flexible documentation system, particularly with regard to the selective crawls. By using a range of components (such as “with paywalls”, “uses https protocol”, “uses advanced JavaScript”, etc.), which can be added to any issue (domain), we can easily group the selectively crawled domains according to different challenges and, for instance, forward problems to be solved by a developer to the developers’ jira space within the system.

11:40 – 12:00

Jessica Cebra: Describing web archives: a standard with an identity crisis?

’ve engaged with web archives for about one year in the role of metadata management librarian. At Stanford, our basic metadata requirements for archived websites are generally modeled after records for other digital resources in the Stanford Digital Repository, but with some tailored fields unique to captured web content. As I began to delve into this new world, with the perspective of a trained archivist, I was struck by the prevalence of bibliographic-oriented descriptive practice across institutions, and wondered where is the archival description in web archives?

In the web archives community, recent publications of recommendations, best practices, and metadata application profiles, promote consistency and tackle the challenges of describing web content for discovery (other efforts to update descriptive standards to encompass born-digital materials are also notable). While some of the recommended approaches claim to bridge and blend both bibliographic and archival description, they are primarily bibliographic in nature.

In light of these developments, along with recent literature highlighting user needs and what they deem is missing from descriptive information, this paper examines existing descriptive records for a diverse sampling of web archives and their employment of bibliographic and/or archival description standards, and ultimately, what “useful” information is gained or lost in a comparison of these approaches. As expected, description is often about the website content itself, but there is a rising call for more transparency in how and why the content was captured since the collector is involved in shaping the focus of a collection and configuring the crawls, ultimately intervening in the way a website plays back (though technical limitations often play a part in a mementos “incompleteness”). Could this descriptive gap be filled with something akin to an archival ‘acquisition information’ or a ‘processing information’ note that provides contextual and technical details, from seed selection criteria to crawling tools and techniques used in the processing of the material?

At Stanford, collectors of web content are librarians, archivists, and other academic staff, known as the Web Archivists Group. It is my hope that this paper will spark a more focused and informed conversation, not only within the group, but in the broader community as well, about what descriptive information is useful, and to whom? And, to apply those decisions to our descriptive practice as it evolves moving forward.

12:00 -12:20

Lorenz Widmaier: Divide-and-conquer – artistic strategies to curate the web

Divide-and-conquer – artistic approaches to curate the web

Twenty-five thousand photographs are within the collection of MOMA New York. In contrast, 95 million photographs are uploaded on Instagram each day. Digitisation strategies of memory institutions are often merely about digitalising physical objects, made accessible in databases like Europeana. If born-digital content is taken into consideration, it is often treated like physical objects, differing only in the techniques needed for access and storage.

These techniques are indeed needed. Nevertheless, we should shed light on the inherent character of born-digital content, and its genesis within an algorithmic, data, social media, or networked society. Aleida Assmann argued that a focus on storing is not enough, considering the inexorably growing amount of data, and pointed on the importance of forgetting. The selection process of what to remember and what to forget cannot entirely be outsourced to search engines. Instead, professional ‘gatekeepers’ are still needed.[1] However, for the GLAM sector and mass media Felix Stalder argues that “they can barely function as gatekeepers any more between those realms that, with their help, were once defined as ‘private’ and ‘public.’ Their decisions about what is or is not important matter less and less.”[2] Morten Søndergaard also describes changing “competences of traditional institutions and genres” and positions “the artist in a new role: mediator.”[3]

We will take a look at artistic strategies mediating the masses of volatile and dynamic born-digital content within the digital society. What can archivists as ‘gatekeepers’ learn from these approaches to curate and preserve an intelligible story of the web? Can archiving still strive for neutrality and be separated from curation/mediation?

We will look at …… how to narrate Google Street View?
… how to travel the world with agoraphobia using big data?
… how to engage with satellite images?
… how to collect practical photography?
… how to visualise the flood of photographs?
… how to archive and illuminate the darknet?
… how to materialise Wikipedia?
… how to archive Interfaces?
… how to narrate personal life-logging data?
… how to visualise data through photography?
… how to document the world through webcam images?
… how to show images banned from Instagram?
… how to display boring data?

List of the presented art projects: www.lorenzwidmaier.com/dl/dac.pdf

[1] Assmann, Aleida (2018): Formen des Vergessens. Sonderausgabe. Bonn: bpb. p. 202f
[2] Stalder, Felix (2018): The Digital Condition. Newark: Polity Press. The great disorder, para. 1.
[3] Morten Søndergaard: The digital archive experience. In: Mogens Jacobsen und Morten Søndergaard (Hg.): RE_ACTION. The digital archive experience. p. 44

12:20 – 12:30


Curatorial strategies