Initiated during the first impresso project, HIPE (Identifying Historical People, Places and other Entities) is a series of evaluation campaigns, or shared tasks, on named entity recognition and linking in multilingual historical documents.
The objectives of the HIPE shared tasks are to 1) assess and advance the development of robust, adaptable and transferable named entity processing systems over challenging historical material, 2) contribute solid benchmark frameworks and enable performance comparison of NE processing on historical texts, and 3) foster efficient semantic indexing of historical documents in order to support scholarship on digitised cultural heritage collections.
As the first evaluation campaign of its kind on multilingual historical newspaper material, the CLEF-HIPE-2020 edition proposed the tasks of NE recognition and classification (NERC) and entity linking (EL) in ca. 200 years of historical newspapers written in English, French and German. HIPE-2020 brought together 13 teams who submitted a total of 75 runs for 5 different task bundles. The main conclusion of this edition was that neural-based approaches can achieve good performances on historical NERC when provided with enough training data, but that progress is still needed to further improve performances, adequately handle OCR noise and small-data settings, and better address entity linking.
The second edition HIPE-2022 broadened the scope and confronted participants with the challenges of dealing with more language, learning domain-specific entities, and adapting to diverse annotation schemas. HIPE-2022 objectives were to contribute new insights on how best to ensure the transferability of NE processing approaches across languages, time periods, document and annotation types in a cultural heritage context. The HIPE-2022 data is an unprecedented asset that researchers can draw on to address the challenges posed by domain and document type changes in historical NE processing.
The HIPE-eval initiative will be continued during the second impresso project and will gradually evolve to include further information extraction tasks on historical documents.
For more information, please visit the HIPE-2020 and HIPE-2022 websites and the HIPE-eval GitHub and Zenodo organisations.