impresso team —  Thu, 18.09 2025

Over the past months we have been busy preparing this major release which includes the Impresso Datalab for programmatic access to our corpus and models, the integration of new newspaper titles from the National Library of France and our Swiss partners, and the roll-out of a completely revised data access management system which paves the way for future corpus expansions.

The Impresso Datalab: Programmatic access to our data and models

One of our goals for Impresso2 is to make it easier to conduct data-driven research on historical media collections. Today we are excited to announce the release of the Impresso Datalab, a significant milestone in the Impresso2 project.

The Impresso Datalab complements the exploratory capacities of the Impresso Web App by allowing programmatic interactions with our data. The Datalab offers access to bibliographic metadata, semantic enrichments and full text via the Impresso API and a dedicated Python library.

Impresso Datalab

Impresso Datalab

Key features

With this release, we offer:

Programmatic access to our data
Initialising an Impresso Client

Initialising an Impresso Client

The Impresso Rest API and Impresso Python library provide access to full text, bibliographic metadata, and semantic enrichments in compliance with legal frameworks and institutional constraints of our partners.

Notebooks for data exploration
Notebook on Visualising Place Entities on Maps

Notebook on Visualising Place Entities on Maps

Notebook on Exploring Entity Co-occurrence Networks

Notebook on Exploring Entity Co-occurrence Networks

Notebook templates are designed to complement the exploratory capacities of the Web App. With this first release we offer geospatial mapping of location entities contained in a query of collection as well as relational perspectives on entity cooccurrences by means of network visualisations.

Models and Annotation services to enrich your own data
Notebooks for enriching your own data

Notebooks for enriching your own data

Example from notebook on Language Identification with impresso-pipelines Package

Example from notebook on Language Identification with impresso-pipelines Package

Researchers can semantically enrich their own data using Impresso’s specialized models (also available on HuggingFace) and ready-to-use pipelines specifically optimized for historical newspaper text analysis. At this stage, we offer a BERT model for the recognition of European press agencies and pipelines for language identification, topic modelling, named entity recognition and OCR quality assessment.

Close Integration Web App & Datalab
Try in Datalab feature in Impresso Web App

Try in Datalab feature in Impresso Web App

Example of results linking back to Impresso Web App

Example of results linking back to Impresso Web App

We strive for seamless, question-driven workflows between both interfaces for scalable reading and versatile exploration. For instance, you can easily export your Impresso Web App query to a Datalab notebook for in-depth analysis, then return to the Web App for detailed examination of specific texts. For convenience, all notebooks can be run via Google Colab but of course also locally based on user preference.

Getting started
  • Create a free Impresso account (if you do not have one already) and subscribe to one of the Impresso plans
  • Get an API key and familiarise yourself with the Impresso Python library to interact with our API to search and download data
  • Experiment with our first notebooks and generate network and spatial views on Impresso data
  • Enrich your own data using our pipelines and models for named entity recognition, press agency detection, language identification and OCR quality assessment

Note that this is only the beginning - the Datalab will remain in constant development throughout the Impresso project. More notebooks to support teaching, critical data exploration and data annotation will follow!

To enter the Datalab, login with your Impresso account or register for one, then accept our revised Terms of Use and request your API key.

We appreciate any feedback on its usage and welcome proposals for additional notebooks via info (at) impresso-project.ch.

Impresso Corpus Expansion

We are pleased to announce that we made a first step towards Impresso’s goal to create a corpus of Western European newspapers and radio sources: A first batch of newspapers from the National Library of France (BnF) have arrived (see below for the first titles we include). In addition, we have added long awaited titles to our Swiss newspaper collection. This includes “Schweizer Arbeitgeber” and “Schweizerische Handels-Zeitung” coming from the Swiss Economic Archives, a total of 43 titles from the regional collections of Bibliothèque Cantonale Universitaire de Lausanne (BCUL) as well as the German and French editions of the Swiss Federal Gazette also known as Bundesblatt or Feuille fédérale, a rich source which informs about Swiss political and legislative decision-making provided by the Swiss Federal Archives (SFA).

In total, this release adds 53 new newspaper titles, more than 180.00 issues and almost 11 million new content items, such as articles or adverts.

Explore new additions to our corpus from the following partners:

Impresso partners

Impresso partners

From France, this first batch includes the following titles:

Front page from Le Petit Parisien

Front page from Le Petit Parisien

New User Access Management System

Alongside the Datalab, we are introducing a new content access management system which allows us to reflect the legal contexts in which our data-providing partners operate as described in our Terms of Use. New user plans reflect the legal frameworks within which our data-providing partners operate, as described in our Terms of Use. Behind the scenes, this new system allows us to grant fine-grained access on the level of individual content items such as newspaper articles or radio broadcasts.

Impresso User plans

Impresso User plans

We distinguish between:

  • Guest users
  • Basic users
  • Student users
  • Academic users
  • Academic+ users (forthcoming)

To qualify for an Academic user account, we ask you to provide a link to your academic profile and to allow 2 working days for verification. With the forthcoming Academic+ user plan, we present an innovation for accessing protected data: users who wish to access data in this category will soon be able to make a request to the corresponding data provider and gain access upon their validation.

Please refer to this overview of the currently available data and the mapping of permitted actions according to user plans.

How to request a change of plan

How to request a change of plan

Note: Existing Impresso users are mapped to the Basic user account by default. If applicable, please upgrade to the Academic plan using the method described above.

Connect with us

The Impresso project has made the choice to retire from X, for obvious reasons. We are now active on Bluesky instead. Follow us to stay updated on the latest developments, events, and insights from the Impresso project. We also have a Discord server where you can report issues you encounter with the Web App or Datalab, or discuss other Impresso-related topics.

Major Release: Introducing the Impresso Datalab, Corpus Expansion and New Data Access Management, Blog post, impresso, 2025 <https://impresso-project.ch/news/2025/09/18/major-release.html>.