National Library of France
Bibliothèque nationale de France – Archives de l’Internet (Bibliothèque nationale de France Web Archives)
Start Date: 2002
Archive interface language(s): French
Access methods: URL Search, Keyword Search, Full-Text Search, Topical Collections
Harvesting methods: National Domain, Bulk, Selective, Event, Thematic
Since 2006, the BnF shares with INA responsibility for the legal deposit of the French online publications and web material. The BnF web archiving program started in 2002 with the first snapshots of election websites, then continued from 2004 with a 5-year partnership with the Internet Archive, which included performing annual broadcrawls of the French domain and the acquisition of historical collections. Today, the BnF performs both domain and selective crawls internally.
In 2015, the BnF archive consists of ca. 668 TB of data (26 billion files) from 1996 until the present day. The scope of this collection is the French web (the .fr domain and all material produced in France or by a publisher based in France) and combines domain, thematic and event harvests. Special collections include a range of national, local and European election harvests, along with thematic collections such as online diaries, blogs and literary websites, and activist websites documenting the social history of the Web. 85 curators contribute to the selection of seeds, forming collections in most areas of knowledge, in line with the BnF’s encyclopedic heritage. In addition, the BnF works with partner institutions who select seeds for certain thematic collections, including more than twenty regional libraries in France, as well as research laboratories, associations and professional organisations.
Due to legal restrictions, the BnF web archives can only be searched and browsed by researchers within the library premises in Paris.
Web Archive Preservation Activities:
• The Library has two long-term tape storage copies, in two distant locations, and one disk access copy.
• The Library uses the WARC (Web ARChive) file format, having used the ARC file format until 2014. Both formats are taken in charge within the preservation system of the Library, SPAR.
• The different historical collections are preserved separately from one another, depending on their production tool and/or their producer.