Digital Humanities - a Challenging Research Field

Julien Nguyễn Đăng Fri, 24.08 2018 — 

Julien Nguyễn Đăng, intern to the impresso project in the summer 2018, prepared a feedback on his experience, reflecting on the challenges it poses for research and teaching.

This is particularly true when we look at how sociological perspectives can benefit history through the window opened by sociohistorical studies (1) or when we consider the effective use of quantitative approaches. To a certain extent, this reflection on the alliance of disciplines possesses deep historiographical roots: as Marc Bloch pointed out in his Apologie pour l’histoire, “auxiliary sciences” such as archaeology and palaeography may be seen as a necessary part of the analytical and investigative process of historians (2). From this perspective, historians have a duty not only to collaborate with specialists in other fields and to set out their needs clearly, but also to learn how to use, interpret and explain the sometimes “magical” and “generous” tools and data that they receive.

The “digital turn” that began in the 1990s certainly constitutes the most recent – and still ongoing – revolution in the humanities, making “new realms of connection visible, new kinds of questions answerable”, (3) as Lara Putnam wrote in 2016. Among the multiple contributions made by computer science that are useful for historians – and beyond progress merely in the areas of word processing, emailing and blogging –, it is useful to differentiate between “mainstream” scholarly digital tools, such as online archives, digital cameras and library-supported databases, and more advanced applications and techniques including image recognition, network analysis and optical character recognition, all of which are constantly evolving. Digital humanities therefore encompasses a huge variety of aspects – an epistemological component which emphasises the complexity of the field. Based on a study of the habits of 1,266 US historians, (4) Robert B. Townsend observed that more than 80% of historians use the first category of computer applications. However, little use is made of more advanced programmes. Why is this? Two reasons may be suggested: first, they might only represent an interest for particular angles or problems requiring deeper investigation. Second, there may be inequalities in terms of the resources available at the institutions where historians are based – there may be a lack of applications or training opportunities. Quite surprisingly, however, according to Robert B. Townsend’s findings, age does not come into play when considering this particular category of applications. It may be that there is simply a common reluctance to embrace new technologies, even more so in 2015 than in 2010, and this is a reality that needs to be considered if we are to broaden the spectrum of digital humanities in a pragmatic fashion.

Although the digital turn in historiography follows a long tradition of revolutions and links between disciplines, as asserted earlier, and although digital humanities should not be seen as fundamentally separate from traditional humanities, (5) it would be remiss not to acknowledge its major breakthroughs. The habits of historians have certainly changed, and this has undoubtedly modified the nature of scholarly productions, particularly if we focus on the countless digitised press corpora. As Paul Gooding points out, researchers may be less likely to read newspapers page after page to find information: the Google-styled keyword search may have influenced research methods. “We navigate newspapers at scale, filtering, searching, refining results and relying increasingly on our computers as creators of meaning and of sense, in a wealth of information beyond the abilities of humans to process effectively.”(6). There is no doubt that this new way of examining newspapers over centuries and millions of articles has extended the possibilities of finding unsuspected material that will have a positive impact on research. But what about the assets of serendipity, of the fastidious but fruitful reading of original newspapers without the filter of an online platform that establishes automatic lists of results for us based on the quality of OCR, on named entity processing or on other semi-automated and even automated processing techniques? What about the filtering of material that occurs with digitisation – which gives rise to heterogeneous results in terms of quality? On a broader scale, while the rise of online databases may result in fewer visits to physical archives – suggesting a decline in the use of original resources, which nonetheless remain a core part of historians’ work –(7), and while some sources may be overrepresented in theses owing to their digitisation and easier access, leading to bias,(8) the digital revolution has resulted in cheaper access to larger volumes of data, sources and people in different countries, enabling for instance easier and more realistic transnational research projects, as Lara Putnam asserts. “But nothing guarantees that the growth of knowledge brought by fallen barriers, broader vision, and multi-scalar research will not be canceled out by increased superficiality and new blind spots.” And these conclusions should be qualified, for, as revealed by Robert B. Townsend’s survey, scholars are still far from deserting archive centres and originals – but for how long?

Generosity and transparency appear to be remedies that will enable historians to make use of these revolutionary and fruitful DH tools while at the same time avoiding their potential biases. Transparency involves pointing out any inherent bias in corpus composition and any variance in a given corpus (e.g. regarding OLR), clarifying content categorisation and OCR quality, and documenting any processing done by the project (e.g. topic modelling). Generosity implies a recommender system, an overview of corpora to broaden search interests and search suggestions within a fluid, browsable interface, creating a daily companion for researchers (9).

This notion of generosity, as contributed by computer scientists and designers, enables users – whatever the purpose of their interest in DH tools – to avoid any “Google effect”, i.e. those “magical” lists of results obtained by keywords which remain even more limited when, for instance, the chosen keywords are polysemic or subject to OCR mistakes, leading to approximations: a little research on extra-terrestrials for the Laurel workshop brought up Woody Allen and German words ending in “-allen” in my “alien” keyword search in multiple newspaper research tools… That is to say, “to escape the search-box”, (10) impresso, like other cutting-edge websites, intends to follow two guidelines at the same time: transparency on the one hand, and generosity on the other.

Transparency may indeed be the leitmotiv of these tools and techniques developed by computer scientists. It is “the perceived quality of intentionally shared information”. (11). In other words, in order to give historians all the tools they need to draft systematic analyses of material and make this material manageable by creating awareness, it is important – if not necessary – to give them relevant information on the archives which appear on screen that will effectively guide their research practices. How was the search algorithm designed? How have these large quantities of data, e.g. thousands of newspaper articles, been collected, scanned and assembled into digitised corpora? (12) What archive centre do they come from? What scanning technique was used? Are some data missing, e.g. issues of newspapers or pages? What is the OCR quality? Such questions remain relevant in order to maximise the scientific quality of historical works and avoid a research process guided by an almighty search engine (13). Being aware of the essence of tools is the duty of humanities scholars.

Furthermore, it is important to implement personalisation functions within these websites: generosity also means giving the user several possible choices. The Laurel and Lavender impresso user workshops, which gave researchers the opportunity to test some functions of the engine, both effectively illustrated this point: historians asked for multiple citation possibilities, one-click functions – particularly for multiple downloading – and personalised work spaces: these observations will undoubtedly help designers enhance the experience of future users. As digital humanities finds itself at the crossroads of computer science and humanities, it remains an “objet-frontière”(“border object”, 14) located in a “trading zone”(15) where specialists do not speak the same language and therefore have to find ways of understanding one another. “Co-design” – a collaborative effort to create a relevant design led by designers, computer scientists and users/researchers at the same time – is certainly an effective way of associating all available skills with researchers’ requests. It is a guideline which the impresso team intends to follow, as emphasised at the aforementioned workshops.

Co-design centres on the development of tools. However, considering that these tools are used by all historians – not only those who might have participated in their design –, both generosity and transparency may not be sufficient. To my mind, learning is central. As I myself have experienced, although positive developments have been observed in the past few years, (16) teaching in digital humanities still has to be included more systematically within curricula (17) and clearly acknowledged as a requirement, not just an “option” that is sometimes ignored. DH is often absent from or barely mentioned in curricula. How can we claim to be up to date and train a new generation of historians if we do not focus on what has constituted the essence of historiography for more than two decades? It is not just a trend brought to life by some geeky historians and computer scientists interested in human sciences: digital humanities is about tools, techniques, news ways of searching, archiving and so on that have already reshaped researchers’ habits, at least to some extent. Computers must, once and for all, stop being associated with feelings of “intimidation”, as Robert B. Townsend points out – a phenomenon that is far from being restricted to the community of historians. However, as Martin Grandjean highlights, there can be no skilled teachers without skilled researchers (18). The current limits of historical research hinder the widespread use of such tools and prevent them from being actively taken up in university environments.

For now, what can we do other than attempting to explain the code or architecture of databases full of technical notions to non-specialists? One solution might involve providing user support through online or live tutorials, or information and help sections to reach out to a large audience of potential users: in other words, making the tools accessible and usable. Another solution is transparency itself: we need to inform all users of what our tools can and cannot do, pending further progress in computer science. However, we should not consider transparency as a magical solution: in this respect, we might need to be aware of our own “black box”, since transparency does not always imply understanding, nor is it necessarily a realistic enterprise (19). But we should also open the black box of historians by encouraging transparency and embracing the idea of “openness”, of collaboration, that is so dear to developers (20), in order to make bridges within this dual identity of digital humanities.


