About archiving

Selection

Like any other material that libraries and archives collect, web archives are selected to complement existing collections and serve different goals. National libraries often focus on collecting their national domains for cultural heritage or as part of copyright deposit regimes and therefore perform broad, very large crawls. These domain crawls represent some of the largest collections of web archives. Universities may concentrate on collecting web archives that serve researcher or educational needs so these collections tend to be focused and deep. Regional and corporate organizations collect web archives for legal or record keeping purposes, targeting specific documents or sites on the web.

Harvest

Web sites are collected via software that downloads code, images, documents, and other files essential to completely and faithfully reproduce the web site at the time of capture. At the same time, the web crawlers also collect metadata about the conditions of the harvest process.

Preservation

The intent of web archiving is to preserve the original form of the harvested content without modification. To achieve this goal the tools, standards, policies and best practices need to be in place that will ensure the management of web archives over time.

Access

Web archives are born-digital collections that require special software tools for their use. Researchers can view archived web sites page-by-page or whole collections can be processed as data, revealing broad characteristics of the collections. The organizations affiliated with the IIPC are committed to ensure that their web archive collections are preserved and made accessible for future researchers, historians and the public.