Glossary

ALTO

The Analyzed Layout and Text Object (ALTO) is an open XML standard to represent information of OCR recognized texts.

The Analyzed Layout and Text Object (ALTO) is an open XML standard originally developed by the EU project METAe (2004), and maintained by the Library of Congress since 2010. The goal is to represent information of OCR recognized texts, i.e. to describe text (strings) and layout information (coordinates of columns, lines and words on a page).

Links: Library of Congress, Veridian



You may also be interested in the following terms :

ALTO

The Analyzed Layout and Text Object (ALTO) is an open XML standard to represent information of OCR recognized texts.

IIIF

The International Image Interoperability Framework (IIIF) defines several application programming interfaces that provide a standardised method of describing and delivering images over the web, as well as “presentation based metadata” (that is, structural metadata) about structured sequences of images.

OCR

OCR (optical character recognition) is the recognition of printed or written text characters by a computer. This involves photoscanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes, such as ASCII, commonly used in data processing.

OLR

In computer vision, document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document.

Topic model

In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Topic modelling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body.