Image

OPENING KEYNOTE

THE MÍMIR PROJECT – LIBRARIES, COPYRIGHT, AND LANGUAGE MODELS

WEDNESDAY, 9 APRIL 09:50 -10:45

NATIONAL LIBRARY OF NORWAY, MÅLSTOVA & STORE AUDITORIUM (STREAM)

Javier de la Rosa will present the groundbreaking findings of the Mímir Project, a collaborative effort between the National Library of Norway, the University of Oslo, and the Norwegian University of Science and Technology. This initiative explores a pressing issue in AI development: the role of copyrighted materials in training large language models (LLMs).

In this keynote, De la Rosa will delve into how incorporating publisher-controlled copyrighted corpora—specifically, books and newspapers—affects the performance of Norwegian LLMs. By empirically testing various data mixtures, the Mímir Project provides critical insights into how copyrighted content improves model capabilities in tasks like sentiment analysis, reading comprehension, and translation. At the same time, the research raises profound ethical and legal questions, highlighting the ongoing tension between AI innovation and intellectual property rights.

Through this session, attendees will gain a deeper understanding of how copyright influences AI training, why certain datasets enhance (or hinder) model performance, and what this means for policy and fair compensation schemes for authors.

Javier de la Rosa is Head of Language Models at the National Library of Norway, where he previously worked as a Senior Research Scientist at the library's AI Lab. A former Postdoctoral Fellow at UNED Digital Humanities Innovation Lab, he holds a PhD in Hispanic Studies with a specialization in Digital Humanities by the University of Western Ontario, and a Masters in Artificial Intelligence by the University of Seville.

Javier has previously worked as a Research Engineer at the Stanford University Center for Interdisciplinary Digital Research, and as the Technical Lead at the University of Western Ontario CulturePlex Lab for Cultural Complexity. He is interested in Natural Language Processing applied to historical and literary text with a special focus on large language models.

Javier de la Rosa

CLOSING KEYNOTE

QUANTIFYING COMPLEXITY:
USING WEB DATA TO DECODE ONLINE PUBLIC DEBATE

THURSDAY, 10 APRIL 16:10 -17:05

NATIONAL LIBRARY OF NORWAY, MÅLSTOVA & STORE AUDITORIUM (STREAM)

Founded in 2014, Analysis & Numbers is a pioneering non-profit cooperative specialising in data-driven insights and analysis across various sectors. Based in Norway and Denmark, the Analysis & Numbers teams study web data from the social sciences perspective, investigating topics such as the spread of misinformation, hate speech, media literacy, and tracking AI content on the web.

In their keynote, Håvard Lundberg and Ida Haugen-Poljac will share the Analysis & Numbers’ experiences of working with the complexities of web data. They will reveal how advanced data gathering from social media platforms, coupled with the use of AI and tailored algorithms, help them to make sense of complex societal dynamics. This involves using web data to quantify hate and polarisation, uncover myths and stereotypical discourses about minority groups, and map the spread of misinformation. In essence, this talk will show how it is possible to turn digital conversations into actionable insights, empowering our society to better understand and challenge the narratives shaping our world.

Photo credit: Knut Neerland

Ida Haugen-Poljac is a social scientist, working as a project manager and analyst at Analysis & Numbers. She holds a Master of Peace and Conflict Studies from the University of Oslo. She has 12 years of experience from the humanitarian and development sector, where she specialised in political and humanitarian analysis in contexts characterised by protracted crises and armed conflict. At Analysis & Numbers, she leads tech-focused research projects, combining qualitative and quantitative methods to deliver actionable insights. 

Photo credit: Knut Neerland

Håvard Lundberg is a computer scientist, working as a developer and analyst at Analysis & Numbers. He holds a bachelor’s degree in computer science and interaction design from the University of Oslo and a graduate degree from the Interaction Design Programme at Copenhagen Institute of Interaction Design. He combines his background as a computer scientist and interaction designer to find new ways of collecting, analysing and communicating insights hidden in complex data sets. In recent years, he has specialised in mapping and uncovering how mis- and disinformation is spread online.