Harvesting Working Group
The Harvesting Working Group’s primary focus is the development of web harvesting technologies, particularly around the Internet Archive’s Heritrix web crawler. The major areas of work include a smart crawler. Other areas of focus include:
- Development of a smart crawler and improving harvesting performance
- Development and support of the WARC file format
- Best practices and databases for sharing crawl information in bulk or selective harvesting
- Feature requests for crawler
- Harvesting the deep web
- Harvesting video and streaming media
 |