Member Navigation

  • International
    I

    A global network of experts archiving the Web for future generations.

    Learn more about IIPC

  • Internet
    I

    The web is a unique and dynamic resource that is of high value to current and future researchers.

    Learn about the value of our work

  • Preservation
    P

    IIPC members archive the web on a local, national, and global scale.

    Browse our members' archives

  • Consortium
    C

    Our community comes together annually to share experiences and present solutions.

    Meet IIPC's member organizations

  • Harvesting Working Group

    The Harvesting Working Group’s primary focus is the development of web harvesting technologies, particularly around the Internet Archive’s Heritrix web crawler. The major areas of work include a smart crawler. Other areas of focus include:

    • Supporting the open source Heritrix crawler
    • Development of a smart crawler and improving harvesting performance
    • Development and support of the WARC file format
    • Best practices and databases for sharing crawl information in bulk or selective harvesting
    • Feature requests for crawler
    • Harvesting the deep web
    • Harvesting video and streaming media