Matching Medieval Manuscripts – Finding palaeographic patterns across IIIF collections

Hannah Busch - Huygens ING (Netherlands), Rutger van Koert - KNAW Humanities Cluster (Netherlands), Mariken Teeuwen - Huygens ING (Netherlands)

Presentation type: Presentation

Abstract:

New and more affordable imaging techniques allowed to shift the focus in digitization of medieval manuscripts from outstanding samples/copies to entire collections and libraries. The virtual reconstruction of dispersed medieval libraries, such as the Bibliotheca Laureshamensis, are one result of these efforts. Still, in many cases the basic data about the manuscripts in these collections are described only by approximation. Dates are often as vague as an indication of an entire century, places of origin are missing or hypothesized. Traditionally, it is the discipline of palaeography, the study of ancient handwritings, that was developed to address the challenge of dating and localizing scripts and describing their historical development.

Because of the high level of expertise required there are – and have been – only a few authorities in the field. Parallel to the increasing amount of digitized materials, computational methods to assist the manual work have been developed, turning palaeography into digital palaeography, computer aided palaeography, or artificial palaeography. The decisive reason for the application of computational approaches is not the distrust in the opinion of these experts, but the fact that due to the growing amount of data available, rethinking about the application of image analysis becomes possible, and perhaps even inevitable.

Therefore, the project “Digital Forensics for Historical Documents” (KNAW 2018-2021) attempts to create a digital tool, based on a deep learning system, in which the unique characteristics of a certain script sample will be matched with similar script samples by making use of digitized manuscript collections available in the world wide web and their IIIF APIs.

While methods of feature extraction regarding the segmentation and comparison of single letters or groups of letters are time consuming when it comes to larger amounts of image data, deep learning is the technique of choice for processing a large number of historical written documents . Large scale digitization projects of the past twenty years and the possibility of exploitation with the help of the IIIF have substantially contributed to reach the critical mass which allows the application of deep learning for the study of medieval scripts.

Within a close collaboration between research software engineering, digital humanities and expert knowledge in medieval script we try to develop a system that aims to help scholars to find a date and place of origin for medieval manuscript material, by searching available online collections for script with similar features. The researcher can then consult the metadata attached to those matches to build a hypothesis for his/her own handwritten document.

The project is still in a phase of experimenting, of trial and error. In our talk we would like to present the workflow of the project and its latest outcomes. How does our project benefit from the global IIIF network, and how is IIIF implemented in our own developments? What are the challenges of using IIIF resources for our purposes? Furthermore, we’d like to discuss how our research can contribute to the IIIF community in the future, and get in touch with peers.

Topics:

  • Using IIIF material for Machine Learning and AI,
  • Discovering IIIF resources,
  • IIIF communities (3D, archives, museums, manuscripts, newspapers, etc.)

Keywords:

  • Deep Learning,
  • Neural Networks,
  • Palaeography,
  • Medieval Studies,
  • pattern recognition,
  • manuscripts,
  • machine learning