WAC 2021: Workshop: SolrWayback 2

Web archive discovery systems and scaling

Thomas Egense & Toke Eskildsen

The Royal Danish Library


This workshop will

  1. Present challenges for building and maintaining a web archive scale discovery system
  2. Explain concrete strategies for running Solr at different scales (same strategies should work for Elacticsearch)
  3. Provide a forum for sharing experiences and problems with the scale of web archives. Bring your own challenges and we will solve them together!


  • An interest in the scaling of web archive discovery systems


The Royal Danish Library has been providing full text search and discovery for the Danish Netarchive for several years, lately using SolrWayback. The archive contains 33 billion records, which are all indexed and available online. Solr is used as the underlying search engine and scaling has been both a design criteria and an ongoing challenge.

Indexing (using Web Archive Discovery (https://github.com/ukwa/webarchive-discovery) and searching (using Solr (https://solr.apache.org/) each have their own issues which can easily compound to larger problems: Setups that works well at a certain size is no guarantee for a working system at 10× that size.

The event is finished.


08 Jun 2021


2:00 PM - 3:30 PM

Local Time

  • Timezone: America/New_York
  • Date: 08 Jun 2021
  • Time: 10:00 AM - 11:30 AM

More Info