Sunny and promising kick-off meeting

Sat, 28.10 2017 — 

On 24-25th of October 2017, the impresso consortium had its first workshop and kick-off meeting, hosted by DHLAB on the beautiful EPFL campus.

The objectives of this first workshop were the followings: getting to know each other and reminding about the main project’s objectives and milestones; understanding better the needs, expectations and contributions of each stakeholder (data providers, historians, journalists, computational linguists, designers); and working together for the first time on concrete aspects, namely corpus acquisition and co-design of exploration interface.

After a kick-off session dedicated to the presentation of partners, the workshop proceeded with two round tables where each group of stakeholders presented its needs, expectations and contribution, in the spirit of co-design that fuels the conception and the practical organisation of impresso.

First, the floor was given to ‘data providers’, that is to say the National Library of Luxembourg, the National Library of Switzerland, the State Archives and the Media Library of Wallis, the Swiss Economic Archives, as well as the Swiss newspapers Le Temps and Neue Zürcher Zeitung. Libraries and archives shared their digitisation strategies and practices for making their collections available to the public and the scientific communities. Overall, it appeared that, for them, the main expected outcomes of impresso are, on the one hand, a better understanding of the usage of digitised newspaper collections and, on the other, OCR corrected and semantically enriched collections.

Sunny and promising kick-off meeting (source)

This first discussion round was further enriched by a special guest from the Berlin State Library, Clemens Neudecker, who shared his experience from Europeana newspapers project (2012-2015). Clemens gave insights into the practical experience this project, with regards to pre-processing, the difficulty of managing a large set of huge corpora of different formats and quality, and the different performance levels attained in terms of newspaper refinement (ONR and OLR). Since impresso tackles the next research step regarding historical newspapers - i.e. the focus is on text mining and exploration and not on digitisation and OCR/OLR, such sharing of experience is extremely valuable.

Next, historians shared their practices of using historical newspapers in research and teaching. This discussion round revealed that the major sources of concerns for historians are 1) the representativity of the available corpus, 2) its contextualisation (i.e. the presence of rich metadata), and 3) transparency - to be provided by the foreseen impresso interface - regarding which data is available, and which tools were applied with which level of performance.

Finally, the impresso team, composed of computational linguists, digital humanists and historians, and designers, presented and commented a selection of state-of-the-art text mining and visualisation tools, in order to illustrate technical possibilities and foster discussion and imagination in this regard.

Phillip Ströbel presents a topic modelling tool (source)

These roundtable discussions and presentations provided the participants with a better understanding of each other’s needs, and with food for thoughts regarding technical possibilities. Based on these insights, participants then gathered in several small working groups and focused on the examination of six historical research scenarios. The objective of this guided brainstorming session was to understand better how historians would concretely proceed given a corpus, a potential exploration interface and a specific research question. The main finding of this workshop is a high demand for transparency. Transparency regarding corpus coverage, which sources are there and which are not, and regarding tools and their performances. Another significant added value of digitized newspapers is the possibility to interconnect collections, and thereby to reframe their browsing. If a newspaper articles are enriched with persons, topics or events, and profiled with lexical information, it also becomes possible to connect them, within and across collections and languages. Their exploration becomes potentially more flexible, but how to best contextualize a query? How can automatic recommendations trigger further insights on a particular topic? How to best design a browsable overview of corpora in order to enable exploration and immersion in a corpus? These are some of the substantial needs expressed during the workshop that will fuel our co-design discussions for the coming years. In addition to an efficient search capacities and a more diverse access to textual material, the interface should be conceived in a more “generous” mindset to broaden the current research practices and harvest the range of advantages the digitisation of historical materials offer.

Concluding a lively group discussion (source)

These first discussions on scenarios will help us to formulate more specific user scenarios, on which we will base our first prototype, planned for July 2018. Before that milestone, we plan on meeting again with our partners beginning of next year, for a more informed discussion on the conception of the research tools and their relevance for their scientific and teaching activities. Their participation all along the project, at regular interval should provide the project with a valuable applied reflexion on epistemological issues, raised by such digital collections and their tools.

In a nutshell, the first workshop proved to be very promising in terms of communication across specialities, between all the stakeholders, thanks to the balance we could reach between roundtables, discussion and direct exchange in smaller groups. We thank all the participants for their input and their energy that led to that fruitful exchange!

The EPFL hosted the workshop in its beautiful campus (source)