iipc netpreserve.org contact
site search with google:
 
publications
about:
mission
who we are
member archives
for members
join the iipc
working groups
press releases
publications:
reports
events:
conferences and
 workshops

software:
toolkit
downloads

Publications / Report / July 20, 2004

Web Harvesting Survey

The Metrics and Testbed Working Group of the IIPC conducted a survey which is an attempt to identify and classify many of the general conditions found on Web sites that influence the harvesting of content and the quality of an archival crawl.

It is intended to provide a high-level overview of common Web crawling conditions, roughly prioritized by their significance, as background information for institutions beginning to engage in web harvesting. We also offer examples of the various issues, and characterize in which of the several phases of the harvesting process the described problems can occur. A complementary document, Test Bed Taxonomy, more completely describes web harvesting issues at a level of detail that will lead to implementation of an IIPC harvesting test bed.

Download the Web Harvesting Survey (report PDF / 124 KB)


Valid XHTML 1.0! top | © 2004-2010 IIPC | copyright and privacy statements | credits
iipc