S-VED (Sacrobosco Visual Element Dataset)

Introduced by Büttner et al. in CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents

The Sacrobosco Visual Elements Dataset (S-VED) is derived from 359 Sphaera editions, centered on the Tractatus de sphaera by Johannes de Sacrobosco (—1256) and printed between 1472 and 1650. The Sphaera editions were primarily used to teach geocentric astronomy to university students across Europe. Their visual elements, therefore, played an essential role in visualizing the ideas, messages, and concepts that the texts transmitted. As a precondition for studying the relation between text and visual elements, a time-consuming image labelling process was conducted as part of “The Sphere” project (https://sphaera.mpiwg-berlin.mpg.de) in order to extract and label the visual elements from the 76,000 pages of the corpus. This process resulted in the creation of the Extended Sacrobosco Visual Elements Dataset (S-VED𝑋) of which S-VED is a subset of. Due to copyright reasons only S-VED is made publicly available. S-VED consists of 4000 pages of which 2040 contain a total of 2927 visual elements. The visual elements are defined by bounding boxes and labels within a CSV file. For more information on the Sphaera corpus, feel free to check the project’s database at http://db.sphaera.mpiwg-berlin.mpg.de/.


