iipc netpreserve.org contact
site search with google:
 
about
mission
who we are
member archives
for members
join the iipc
working groups
press releases
publications:
reports
events:
conferences and
 workshops

software:
toolkit
downloads

Harvesting Working Group

The Harvesting Working Group’s primary focus is the development of web harvesting technologies, particularly around the Internet Archive’s Heritrix web crawler. The major areas of work include a smart crawler. Other areas of focus include:

  • Development of a smart crawler and improving harvesting performance
  • Development and support of the WARC file format
  • Best practices and databases for sharing crawl information in bulk or selective harvesting
  • Feature requests for crawler
  • Harvesting the deep web
  • Harvesting video and streaming media

 


Valid XHTML 1.0! top | © 2004-2010 IIPC | copyright and privacy statements | credits
iipc