In the perspective of setting up a Web archiving chain, the following tools are recommended and used by members of the IIPC:
Acquisition
ArchiveFacebook, a Mozilla Firefox add-on for individuals to archive their Facebook accounts
Developed by: Mat Kelly, Carlton Northern, Hany SalahEldeen, Michael Nelson, and Frank McCown
Current version: 1.4
More information: https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/
Heritrix, an open source, extensible, web-scale, archival quality web crawler
Developed by: Internet Archive with the Nordic national libraries
Current versions: Heritrix 3.1.1 (2012-05-02); Heritrix 1.14.4 (2010-05-10) and Heritrix 2.0.2 (2008-11-08)
More information: https://webarchive.jira.com/wiki/display/Heritrix/Heritrix
Download (3.X): http://builds.archive.org:8080/maven2/org/archive/heritrix/heritrix/
Download (2.X, 1.X): http://sourceforge.net/projects/archive-crawler/
HTTrack, an open source website copying utility
Developed by: Xavier Roche and other contributors
Current version: 3.46-1 (2012-06-23)
More information: http://www.httrack.com/
SiteStory, a transactional archive that selectively captures and stores transactions that take place between a web client (browser) and a web server
Developed by: Los Alamos National Laboratory
Current version: 1.0
More information: http://www.dlib.org/dlib/september12/09inbrief.html
Download: http://mementoweb.github.com/SiteStory/
WARCreate, a Google Chrome extension for archiving an individual webpage or website to a WARC file
Developed by: Mat Kelly
Current version: unreleased
More information: http://matkelly.com/warcreate/
Warrick, an open source downloadable tool or web service for reconstructing websites from web archives, using Memento
Developed by: Frank McCown
Current version: 2.2.1 (2012-04)
More information: http://warrick.cs.odu.edu/
Download: http://code.google.com/p/warrick/downloads/list
Wget, an open source file retrieval utility
Current version: 1.14 (2012-08-05)
More information: http://www.gnu.org/software/wget/, http://www.archiveteam.org/index.php?title=Wget_with_WARC_output
Download: ftp://ftp.gnu.org/gnu/wget/
Curator Tools
CINCH, an open source tool for batch retrieval of Internet-accessible documents and transfer to a preservation system
Developed by: State Library of North Carolina
Current version: 1.0 (2012)
More information: http://cinch.nclive.org/Cinch/
Download: http://slnc-dimp.github.com/Cinch/
NetarchiveSuite, a curator tool allowing librarians to define and control harvests of web material. The system scales from small selective harvests to harvests of entire national domains. The system is fully distributable on any number of machines and includes a secure storage module handling multiple copies of the harvested material as well as a quality assurance tool automating the quality assurance process.
Developed by: the Royal Library and the State and University Library in the virtual organisation netarchive.dk
Current version: 3.19.0 (2012-03-04)
More information and download: https://sbforge.org/display/NAS/Releases+and+downloads
Web Curator Tool (WCT), a tool for managing the selective Webharvesting process is designed for use in libraries and other collecting organisations, and supports collection by non-technical users while still allowing complete control of the Webharvesting process. The WCT is now available under the terms of the Apache Public License.
Developed by the National Library of New Zealand and the British Library and initiated by the International Internet Preservation Consortium
Current version: WCT 1.5.2 (2011-08-22)
More information and download:http://webcurator.sourceforge.net/
Collection storage and maintenance
HTTrack2ARC, a tool for converting HTTrack output to the ARC format
Developed by: Portuguese Web Archive
Current version: 1.0 (2012-01)
More information and download: http://code.google.com/p/httrack2arc/
Java Web Arrchive Toolkit (JWAT), a tool for reading and validating ARC and WARC files
Developed by: Netarchive.dk
Current version: 1.0.0 (2013-02-11)
More information and download:
https://sbforge.org/display/JWAT/JWAT
JHOVE2, an open-source format characterization tool. New format modules include ARC, WARC, and GZIP formats.
Developed by: California Digital Library, Portico, Stanford University Libraries, Bibliothéque Nationale de France and NETARKIVET.DK
Current version: 2.1.0 (2013-03-18)
More information: https://bitbucket.org/jhove2/main/wiki/Home
JHOVE2 User's Guide: http://bitbucket.org/jhove2/main/wiki/documents/JHOVE2-Users-Guide_20110222.pdf
Download: https://bitbucket.org/jhove2/main/downloads
SiteStory, a transactional archive that selectively captures and stores transactions that take place between a web client (browser) and a web server
Developed by: Los Alamos National Laboratory
Current version: 1.0
More information: http://www.dlib.org/dlib/september12/09inbrief.html
Download: http://mementoweb.github.com/SiteStory/
Web Archive Transformation (WAT) Format, specification
Developed by: Internet Archive
Current version: (2011-05-31)
More information and download: https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Transformation+%28WAT%29+Specification,+Utilities,+and+Usage+Overview
Web Archive Transformation (WAT) Utilities, a toolset for extracting select metadata from WARC files for the purpose of data analysis
Developed by: Internet Archive
Current version: (2011-05-31)
More information and download:
https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Transformation+%28WAT%29+Specification,+Utilities,+and+Usage+Overview
WarcManager, a tool for exploring the contents of WARC files
Developed by: University of Maryland
Current version: 2
More information: https://wiki.umiacs.umd.edu/adapt/index.php/WarcManager
Download: http://adaptci01.umiacs.umd.edu:8080/jenkins/job/Warc%20Manager%202/
WARC Tools, a toolset for reading and manipulating WARC files and converting ARC files to WARC
Developed by: Hanzo Archives and Internet Archive
Current version: 4.7
More information: http://code.hanzoarchives.com/warc-tools
Download: http://code.hanzoarchives.com/warc-tools
Access and finding aids
MementoFox, a Mozilla Firefox add-on enabling temporal browsing of the web and discovery of resources in distributed web archives using Memento
Developed by: Rob Sanderson, Ahmed AlSum
Current version: 0.9.52 (2011-12-11)
More information: https://addons.mozilla.org/en-us/firefox/addon/mementofox/
NutchWAX (Nutch with Web Archive eXtensions), a tool for indexing and searching Web archives using the Nutch search engine and extensions for searching Web archives
Developed by the Internet Archive and the Nordic national libraries
Current version: 0.13 (2010-03-19)
More information and download: http://archive-access.sourceforge.net/projects/nutch/
WERA (WEb aRchive Access), a Web archive search and navigation application. WERA was built from the NWA Toolset, gives an Internet Archive Wayback Machine-like access to Web archives and allows full-text search.
Developed by: Internet Archive and the National Library of Norway
Current version: 0.4.1 (2006-01-17)
More information and download: http://archive-access.sourceforge.net/projects/wera/
Wayback Machine, a replay tool for web archives stored in ARC or WARC file formats, allowing temporal navigation of archived web resources
Developed by: Internet Archive
Current version: 1.6.0 (2011-01-03)
More information: https://webarchive.jira.com/wiki/display/wayback/Home
Download: http://archive-access.sourceforge.net/projects/wayback/
Xinq (XML INQuire), a search and browse tool for accessing an XML database
Developed by: National Library of Australia
Current version: 0.5 (2005-07-26)
Download: http://sourceforge.net/projects/xinq/




