Digital resources – the national project of webharvesting and webarchiving in Slovakia 12:00-12:30

Peter Hausleitner & Jana MatúškováTiakiwai

Andrej Bizík, Peter Hausleitner & Jana Matúšková, University Library in Bratislava

In April 2015 the University Library in Bratislava (ULB) was charged with the national project ‘Digital Resources – Web Harvesting and E-Born Content Archiving.’ The goals of the project were acquisition, processing, trusted storage and usage of the original Slovak digital resources. Its ambition was to establish a complex information system for harvesting, identification, management and long term preservation of web resources and e-Born documents (a platform for controlled web harvesting and e-Born archiving). The Digital Resources Information System consists of specialised, mainly open source software modules in a modular system with a high level of resource virtualization. The basis represents the server cluster, which consists of dedicated public and internal portal server and a form of work” servers for running the system processes. The system management is optimized for parallel web harvesting. This enables the system to carry out the full domain harvest with required politeness in acceptable time.

At present, the ULB web archiving system disposes with 800 TB storage. The application is supported by a powerful HW infrastructure. There is a form of 21 blade servers representing a virtual environment for multiple harvesting processes and 3 standalone database servers. The HW components are interconnected via high speed channels. The system consists further of the support modules for communication, monitoring, backup and reporting. A very useful system feature is a functionally identical parallel testing environment, which enables preventive harvest and problem analysis without interference of the production processes.

A substantial part of the system is the catalogue of websites, which is regularly updated during the automated survey of the national domain .sk. Domains that match our policy criteria are added to the catalogue manually (e.g. .org, .net, .com, .eu).

The operation, management and development of the Digital Resources Information System performs the department Deposit of Digital Resources of ULB with one head, three specialised digital curators and one part-time person for born-digital titles.

The project finished in the fall of 2015. At present the routine practice continues. Since 2015 ULB has performed three full-domain harvests (harvesting of the national .sk domain), multiple selective and thematic crawls.

Mon 12:00 am - 12:00 am
digital resources, e-born archiving, webharvesting
Digital resources – the national project of webharvesting and webarchiving in Slovakia 12:00-12:30