The impresso project adheres to a set of guiding principles to help foster the productive collaboration across academic disciplines and to strive for relevance of its outputs within and beyond them.


We work according to the principles of co-design, meaning that all team members make an active and creative contribution to the project’s overall objectives. Co-design is based on the belief that complex problems are best solved when - in our case - computational linguists, designers, developers and historians work together in a continuous push-pull interaction.

Within impresso, NLP and design open up opportunities for research and exploration, while historians help guide the process of data enrichment based on specific needs, assess the suitability of the data for historical research, and express quality assurance needs. In addition, impresso project partners contribute their knowledge of the collections and their needs for source enrichment. In our experience, this close cooperation helps to meet quality standards, to identify new opportunities, and to detect and solve problems early. Finally, the emphasis on co-design also ensures that our user interfaces have an adequate learning curve for end users.


The complexity of datafication processes and the wealth of theoretically relevant data and knowledge pose a significant challenge. Merely displaying all information overwhelms and confuses researchers, rather than empower them. We thus turn the question around and ask: Given specific tasks and data, what do we actually need to consider in order to draw valid conclusions? How can we empower researchers to obtain and consider relevant information? Within DH, adequate critique of data, software, interfaces, methods and outputs from the standpoint of established insights and best practices remains an unresolved problem. We subsume these requirements - regarded as essential skills for historians in the digital age - under the term “transparency”.

Interoperability, scalability and sustainability

Starting from digitised sources originating from diverse cultural institutions, the production of semantic enrichments integrated into a shared historical information and vector space with which users can interact requires the development and deployment of various interdependent components. To this end, we consider which information needs to flow where and how, and how best to encode it according to different needs, e.g. offline and online processing steps, storage, distributed computing, web services.

We strive for a high level of interoperability in terms of data formats and representation models, allowing the seamless handling of sources and semantic enrichments by the various project components and their smooth reintegration into libraries and media archives. We also ensure the necessary scalability for the processing of large-scale media collections.

Finally, we are working towards the sustainability of the developed tools, services and interfaces: in close collaboration with our partners, we establish a roadmap towards the consolidation, preservation and maintenance of impresso as a historical media platform for institutions, researchers and contributing developers. This includes considerations of the questions of governance, distributed maintenance and growth, and the provision of digitised media and derivative results as research data that can be persistently queried, cited and retrieved.

Open science

We adopt open science practices based on the FAIR principles and ensure open access to project results through publication of software, datasets and articles, as well as the reproducibility of research results through publication of notebooks and code on versioned repositories.