Projects

The IIPC funds technical and educational projects based on the goals outlined in an annual Request for Proposals. The consortium also collaborates on research and development projects by sharing data and testing tools. Task forces are formed to study and make recommendations on specific issues or problems. Working groups also sponsor their own projects and work packages.


Current Projects

COLLABORATIVE COLLECTIONS

 

MEMENTO

To goal of the project is to aggregate the metadata of the distributed archives of the IIPC, and

  • to provide Memento based access to the holdings of open archives
  • to provide knowledge of the holdings of restricted archives
  • to provide knowledge to IIPC members of the holdings of totally closed archives
  • initial demo for participants, then IIPC
  • no access provided to restricted archives.

Past Projects

 

CROWDSOURCING WORKSHOP & USE CASES

The project aimed to investigate how crowdsourcing web archiving activities may begin to redress that balance and increase the amount of manpower available to throughout all stages of the web archiving workflow in member institutions.

DOMAIN CRAWL REPORT

The IIPC Harvesting Practices Survey was developed  in order to understand, analyze and to collate the Internet archiving processes and experiences amongst IIPC members. The objective was to encourage and support memory institutions everywhere to address archiving and preservation of web resources by providing a benchmark and giving an overview of current web archiving practices. 

 EVALUATING TWITTERVANE

The primary goal of the project was to evaluate the Twittervane – a prototype application, which is capable of analyzing Twitter feeds and determining which websites are shared most frequently around a given theme over a given time period.

HOW TO FIT IN? INTEGRATE A WEB ARCHIVING PROGRAM IN YOUR ORGANIZATION

This IIPC-sponsored workshop was held at the Bibliothèque nationale de France (26-30 Nov. 2012). The aim was to investigate the challenges and methods involved in implementing web archiving in all mainstream activities of a heritage institution: general institution strategy, acquisition practices, IT operations, preservation, access. 

 JHONAS

The overall goal of the project was to enhance existing tools in order to ease the adaptation of WARC as the prefered archiving format for digital preservation. In order to accomplish this, two applications were chosen which would cover the entire digital preservation workflow. The two applications chosen were: JHove2 and NetarchiveSuite.

LIVE ARCHIVING HTTP PROXY

The Live Archiving Proxy (LAP) project was a collaboration between Ina and Netarkivet.dk to build an HTTP proxy that would able to capture the traffic that flows trough it, and delegate the handling of the captured data to a writer using a simple network protocol. The goal was to be able to write the captured traffic into any kind of archive format using any computer language. 

  PHD SPONSORSHIP

The University of North Texas College of Information sponsored a 3-year award to support doctoral studies in its Interdisciplinary Information Science Ph.D. Program.

STAFF EXCHANGE

The purpose of the project was to gather expert advice, assistance and guidance in the processes of migration from Heritrix 1 to Heritrix 3 and setting up distributed crawls with Heritrix 3.  

 STATISTICS AND QUALITY INDICATORS FOR WEB ARCHIVING

In 2009, the ISO Technical Committee 46 (Information and Documentation) decided to set up a working group on “Statistics and Quality Indicators for Web Archiving”. The group has delivered a Draft Technical Report (PDF) in 2013.

TWITTERVANE

Prototype/Investigatory project by the British Library to use Twitter to build a web archive collection.

WARC TOOLS PROJECT

The main goal of the WARC Tools project was to facilitate and promote the adoption of the WARC file format for storing web archives by the mainstream web development community by providing an open source software library, a set of command line tools, web server plug-ins and technical documentation for manipulation and management of web archive files, or WARC files.

WAYBACK, HERITRIX AND NUTCHWAX DOCUMENTATION

2009 project led by the Internet Archive that documented NutchWAX, Heritrix, and Wayback.

WEB ARCHIVE PROFILING VIA SAMPLING

Research project looking at how archives respond to queries for archived content and over time build up a profile of the top-level domains (TLDs), Uniform Resource Identifiers (URIs), content language, and temporal spread of the archive’s holdings.