Evaluating Twittervane

Status: Past Project


Author: Helen Hockx-Yu


Project proposal

The British Library has received previous funding from the IIPC and developed a prototype application called Twittervane, which is capable of analyzing Twitter feeds and determining which websites are shared most frequently around a given theme over a given time period. These websites can then be presented to curators as potential titles for web archiving, saving time and effort required for manual selection.

The Twittervane system has 2 additional advantages:
  1. It allows archiving institutions to respond to sudden or sporadic events more quickly;
  2. It uses popularity of websites as selection criteria and exploits the wisdom of the crowd, adding social aspects to web archiving.
The British Library presented Twittervane at the 2012 General Assembly and made the  source code of Twittervane available to IIPC members. This is a short and focused follow-on project of up to 8 weeks to evaluate the Twittervane prototype in order to determine the effectiveness of the approach and of  the implementation. Three other IIPC members – the Library of Congress, the Bibliothèque nationale de France, and the New Zealand National Library have kindly agreed to help us by performing the evaluation.

Scope of the proposed project

The primary goal of the project is to evaluate the Twittervane prototype.  To ensure the prototype is usable by those who will be performing the evaluation, we will address the following areas:
  1. Improved documentation covering installation and basic usage.
  2. Improvement the usability of the current simplistic user interface to make it more intuitive and usable.
The other IIPC members will then be invited to help evaluate the prototype. During  this process, we will maintain the prototype and support the users as they explore  the system. This will culminate in the solicitation of formal feedback from the other IIPC users, recommending future directions for the methodology and the implementation.
Based on user feedback during the project, and as time and resources permit, we will also attempt to address the following known shortcomings of the Twittervane  prototype.
  1. Enable the prototype to store collected tweets in plain JSON files, so large-scale analysis can be performed.
  2. Put  in  place  data  cleaning  capabilities  to  automatically  remove  old, analyzed tweets.
  3. Automate background processes to analyze and process captured tweets.
  4. Put in place management capability for the Twitter streaming process so it can be monitored and the system can respond appropriately in case of any problems. This will likely involve splitting the monitor function from the current prototype and implement it as a separate backend process. Finally, we will also ensure the development of the Twittervane software is made open source (Apache 2.0 license), and moved to GitHub.


The key deliverables of the project are:
  1. The project final report (containing the evaluations from the IIPC members), and
  2. The initial release of the Twittervane application itself (including documentation covering installation and basic usage).