IIPC TSS Webinar: Under the Hood of Solrwayback 4
The IIPC Technical Speaker Series (TSS) facilitates knowledge sharing and foster conversations and collaborations among IIPC members around web archiving technical work. During this webinar, Thomas Egense, Toke Eskildsen, Anders Klindt Myrvoll, Jesper Lauridsen, and Jørn Thøgersen of the Royal Danish Library, will walk you through Netarchivet’s SolrWayback (version 4.0 was released on 20 December). SolrWayback is a bundled software package for exploring web archives. As the name suggests, it’s a fusion of discovery (Solr) and playback (Wayback) functionality. Besides full-text search, Solr provides multiple ways of aggregating data, moving common net archive statistics tasks from slow batch processing to interactive requests. Based on input from researchers the feature set is continuously expanding with aggregation, visualization and extraction of data. SolrWayback relies on real time access to WARC files and a Solr index populated by the UKWA webarchive-discovery tool.
SolrWayback 4, which now runs faster, has a redesigned interface with easier navigation, including content sensitive help, and more functionality (e.g. you can see the WARC header for a single post). The search field has been reworked in order to make large and complex queries much easier to manage. Other key features include searching with an uploaded file, through the Ngram interface, as well as word cloud generator and link graph exports.
Agenda:
- Anders Klindt Myrvoll: SolrWayback and Netarkivet
- Toke Eskildsen: webarchive_discovery & Solr
- Thomas Egense: Demo
- Jesper Lauridsen & Jørn Thøgersen: frontend developing
The presentation will be followed by a Q&A session chaired by Lauren Ko, University of North Texas Libraries.
Anders Klindt Myrvoll is the Programme Manager at the national Danish web archive, Netarkivet, at the Royal Danish Library. Together with colleagues, he is collecting, preserving and providing access to the Danish web. Prior to web archiving Anders worked more than 13 years in management and production in the film and media industry, collaborating globally on everything from high end localization to original content, and along the way also gaining extensive experience in digitization and preservation of cultural heritage. You can find him at https://www.linkedin.com/in/andersklindt/ or @andersklindt on Twitter.
Jesper Lauridsen is a frontend developer, focused mainly on usability and, to some degree, the user experience of the products of the Royal Danish Library. In this particular project, he’s worked with Jørn Thøgersen to create the frontend architecture for the underlying SolrWayback services. You can find him on twitter as @justjspr, ranting about everything from football to coding bugs.
Jørn Thøgersen is lead frontend developer at the Royal Danish Library. During the past 14 years he has worked on many major web applications centered around various sides of cultural heritage. In the SolrWayback project he layed out the technical tracks for the frontend in close collaboration with Jesper Lauridsen. Aside from developing for the web he has a great passion for DIY projects and power tools. You can find him at https://www.linkedin.com/in/j%C3%B8rn-th%C3%B8gersen-50b271/ or @jorntx on Twitter.
Thomas Egense is lead developer on SolrWayback. I work as a Java backend programmer on several projects for the Royal Danish Library where I have worked for the last 9 years. SolrWayback and projects involving AI/NLP are my favorite projects. In my spare time I am making mathematical art with my own software and I have a few other github projects going as well. If you have any questions on SolrWayback you are always welcome to email me at thomas.egense@gmail.com