Member Navigation

  • International
    I

    A global network of experts archiving the Web for future generations.

    Learn more about IIPC

  • Internet
    I

    The web is a unique and dynamic resource that is of high value to current and future researchers.

    Learn about the value of our work

  • Preservation
    P

    IIPC members archive the web on a local, national, and global scale.

    Browse our members' archives

  • Consortium
    C

    Our community comes together annually to share experiences and present solutions.

    Meet IIPC's member organizations

  • Text Mining

    Caption: Findings of temporal co-word analysis for query "Assad,"

     
    Large‐scale corpuses of captured websites offer the possibility for analysis of textual patterns and trends. Research projects studying the frequency of term usage or sentiment analysis have used web archive collections to extract, visualize, and analyze the language used in crawled websites.
     
    This type of analysis can uncover relationships such as co‐occurrence frequency between terms. Sentiment analysis can also be performed on large bodies of text to determine the emotions used when discussing specific topics. Much like digitized books can be mined for language usage patterns, websites show modern patterns of language.
     
    Examples
     
    “The Ngram search is a phraseusage visualization tool which charts the monthly occurrence of userdefined search terms or phrases over time, as found in the UK Web Archive.”
     
    Searching the (News) Archives, Web Archive Retrieval Tools (University of Amsterdam) (findings 1 and 2)
    Demonstration of the possibilities of research with web archive search tools, focusing on a collection of a Dutch news aggregation website. Shows word frequency visualizations and analysis of term cooccurrence over time in relation to major news events.
     
    Proposal for using web archive collections to analyze online discussion of 1960s Liverpool poets, in comparison to data from newspapers and other formally published reviews.
     
    Project using Common Crawl - captured websites to count mentions of the top 1000 companies on Forbes' list of The World's Biggest Public Companies
     
    This case study is part of a Web Archiving Use Cases report written by Emily Reynolds for the Library of Congress.