Nation wide webs 13:30-14:00
Jefferson BaileyTiakiwai
Jefferson Bailey, Internet Archive
This talk will outline efforts to build new nation-specific web archive access portals with enhanced aggregation, discovery, and capture methods. Many national libraries have been conducting web harvests of their ccTLD for years. These collections are often composed solely of materials collected from internally-managed crawling activities and have access endpoints that are highly restricted to reading-room-only viewing. These local-access portal largely adhere to the “known-URL” lookup and replay paradigm of traditional web archive access tools.
Working with partners, and as part of advancing R&D on improving access to web collections, the Internet Archive have been developing new portals to national web domains in concert with the work of national libraries with the mandate to archive their websphere. These collections are “sourced” from a variety of past and scheduled crawling activities — historical collections, specific domain harvests, relevant content from global crawling, in-scope donated and contributed web data, curatorial web collecting, user-submitted URL contributions, and other acquisition methods. In addition, these portals leverage new search tools including both full-text search, non-text item (image, audio, etc) search, linkback from embedded resources, relevant content identified by geoIP matching or PageRank-style scoring, and categorization such as “highly visited” or “no longer on the live web.” While giving new life to the discovery and use of ccTLD-specific web access portals, the project is also exploring how new features, functionality, profiling, and enhanced discovery and reporting methods can advance how we think of access to web archives.