The Digitization Process
From paper to pixels: digitizing a late-medieval manuscript
Over the course of five centuries, the Gebetsbuch of the convent sisters of Medlingen traveled on winding paths from Medlingen in Swabia to the library of the Tiroler Landesmuseum Ferdinandeum in Innsbruck, where it is kept today. In October 2024, the prayer book went on its next big journey: from the real world to the digital space. In a collaborative project between the Ferdinandeum, the University of Innsbruck, the University of North Texas and the Universitäts- und Landesbibliothek Tirol, the late-medieval manuscript was digitized.
But how does five-hundred-year-old paper turn into pixels? In a first step, the Gebetsbuch was brought from the Ferdinandeum to the university library in an acid-free cardboard box. Once the university librarians had received it and had signed off the official paperwork — the transfer of a late-medieval manuscript is no small affair, after all — they handed it over to their digitization technicians. Their main job is to digitize historical documents from the university library, so if there is one thing that these people are good at, it’s scanning pages.
The scanning process: a study in patience
You might be familiar with the free-to-use scanners at a university library, and maybe you have already scanned an article or a book there. In the process, you probably discovered that while they produce decent results, they are not exactly perfect. That is not the quality we were looking for in the Gebetsbuch project, so for our little manuscript, heavier machinery was needed. Think again of the scanners at the uni library and then imagine them twice as large and three times as complex.1 The digitization technicians work with two state-of-the-art scanners that generate high-quality pictures and can be used to scan a variety of formats, from heavy tomes and large-size newspapers, to small manuscripts like the Gebetsbuch.
The scanning process itself is ideally a two-person job: one person positions the book, turns the pages, and holds them in place, while the other initiates the scanning process and checks the results on the screen. As you can probably imagine, this is a job that requires a lot of precision and patience. Because the Gebetsbuch is tiny (11.5 x 7.5 cm, or roughly the size of your fist) and has a lot of pages, which are bound together tightly, it cannot be opened fully without breaking its spine (and believe me, you do not want to be the person who damages the spine of a five-hundred-year-old manuscript). So, every folio — which is the technical term for the page of a manuscript — had to be scanned separately, both its front (called “recto” and often abbreviated as “r”) and its back (called “verso”, abbreviated “v”). When the digitization technicians finished scanning the Gebetsbuch, they had produced a whopping 496 images!
The transcription process: AI to the rescue
Now what do you do with almost five hundred pictures of a five-hundred-year-old late-medieval manuscript? The next step in our project was to transcribe a part of the Gebetsbuch, specifically the folios 82r to 102v containing, according to the descriptive catalogue of the Ferdinandeum, “An Itinerary from Medingen to Jerusalem and the Burial Site of St. Katherine in Sinai in 165 Days”2 — essentially a description of a journey to the Holy Land, which the convent sisters, who rarely left their monastery, might have used as a meditational guide to go on an imagined pilgrimage to Jerusalem. For this, we used the transcription tool Transkribus. Transkribus is an AI-powered platform for text recognition, layout recognition, and image analysis that can be used on a wide range of historical documents, from manuscripts to modern records.
So what did we do with Transkribus? First, we split the work so that each of us, including our professor, was responsible for transcribing the same number of pages. We all created a Transkribus account where we uploaded the images of the folios that we needed to transcribe. Then the fun part started: we let Transkribus work its magic with its automatic transcription based on text models.
A text model is an AI algorithm that has been trained on a specific set of data, including images and transcriptions. Its purpose is to accurately determine the most likely sequence of characters for each section of handwritten text.3
Since Transkribus does not have a universal model that applies to all types of handwriting, we had to choose a text model that would work for the script, language, and time period of the Gebetsbuch. We settled on a model that had been trained on manuscripts containing religious texts and travelogues from the 15th and 16th century that were mainly written in Gothic cursive and Bastarda, and it produced fairly decent results. Still, the automatic transcription was not entirely accurate, and it was up to us to check the pages and manually correct any mistakes. This was an important step of the process, because at a later stage, our transcription of the Itinerary will serve as the so-called “Ground Truth” documents to train an AI text model in Transkribus that will be able to transcribe the whole Gebetsbuch more easily. “Ground Truth” is the accurate and verified data, which is used to train machine-learning models.4 The better the Ground Truth data, the better the model.
However, that is a whole new project that goes beyond the scope of our class on Digital History. For this project, we watched the digitization process happen and we transcribed the Itinerary section of the book. Over the course of the semester, we accompanied a late-medieval manuscript on its journey from the library to our laptop screens.
Author: Elisa Wasserer
Citations
- See, for example, the Aufsichtscanner OS Q1, “A1 Scanner Zeutschel OS Q1,” Zeutschel, accessed 20.01.2025, https://www.zeutschel.de/produkte/os-q1/.
- Bernhard and Hans Peter Sandbichler, Handschriftenkatalog des Museum Ferdinandeum: Die Codices des Tiroler Landesmuseums Ferdinandeum bis 1600 (Innsbruck, 1999), pp. 126-128; here p. 127.
- “1. Automatically transcribing your documents,” Transkribus Help Center, Transkribus, accessed 20.01.2025, https://help.transkribus.org/automatically-transcribing-your-documents.
- “What is Ground Truth? – READ-COOP,” blog entry, READ-COOP accessed 10.05.2023, https://readcoop.eu/what-is-ground-truth/.