The impresso project seeks to put the natural language processing (NLP) tools and annotations it creates to good use for its target audience of historians. This requires a novel interface for searching historical newspapers, which facilitates exploration and meets the requirements of scholarly research. Our team of historians, designers, developers and computational linguists are working together to achieve this goal by using co-design principles.
A generous interface
The aim of the impresso interface is to overcome some of the limitations of other contemporary interfaces. One such example is the restrictions of basic keyword search, which Mitchell Whitelaw called the “small corridor” situation (DHQ, 2015, 9.1). Whitelaw used this term to describe an interface that responds to the results of the query formulated by the user very narrowly, thereby hiding relevant related information. The opposite scenario is “generous” interfaces, which go beyond known-item searches. They offer a high-level view of content and give users suggestions to broaden their search experience.
Digitised newspapers are rich and valuable sources for the reconstruction of historical belief systems and knowledge horizons. impresso aims to exploit the strengths of digital media to the full by integrating rich data visualisations into its interface. Data visualisations offer a variety of novel entry points to collections and can point users to resources they may otherwise miss. Such techniques will prove their worth only when combined with contextual information about the materials and access to facsimile scans of the original newspapers. impresso also seeks to provide for the horizontal exploration of newspaper collections by facilitating cross-media and cross-lingual search, for example by linking disambiguated named entities such as persons, locations and institutions across corpora.
While historians are open to the idea of applying new methods for the analysis of large-scale document collections which go beyond human capacity for close reading, they are rightfully concerned by the presence of a number of inherent biases caused by OCR errors, the loss of original context, copyright restrictions and digitisation policies. An awareness of such biases and gaps in the temporal, spatial or political coverage of the newspaper collections can go a long way to empowering users to conduct more reflective research. impresso explores the extent to which data visualisation can help create such transparency and better equip its users to assess the significance of their findings.
Searching and finding
One of the core components of historical research practices is the unplanned discovery of relevant information as opposed to targeted searching. While this phenomenon is commonplace in the traditional study of printed newspapers, contemporary interfaces can still be improved in this regard by enabling users to explore corpora. Exploration, or browsing with the help of faceted search tools or other tools offered by a generous interface, is a key issue in the co-design process of the impresso interface.
From the beginning of the project, historians and developers have been working together to learn from each other, experiment and evaluate. The goal of this approach is for both sides to gain a deeper understanding of the capabilities of tools and the needs of scholars. Co-design brings together historians’ understanding of the complexities of the data they study and the technical and aesthetic skill of designers and developers.
With this in mind, we are creating an interface that combines well-established functionalities such as keyword search with novel computational tools for the exploration of corpora and the discovery of relevant content.