Dictionary of Old Occitan medico-botanical terminology

Project description

The DiTMAO project is funded by the DFG (Deutsche Forschungsgemeinschaft) and aims at constructing an ontology-based information system for Old Occitan medico-botanical terminology. It is a joint project of Georg-August-Universität, Universität zu Köln and Istituto di linguistica computazionale "Antonio Zampolli" C.N.R.

Old Occitan is the medieval stage of Occitan, the autochthonous Romance language spoken in Southern France, today regional minority language with several dialects. During the Middle Ages, the region and its language played a significant role in medical science due to the medical schools of Toulouse and Montpellier and the strong presence of Jewish physicians and scholars. For this reason, Old Occitan medico-botanical terminology is documented both in Latin and in Hebrew characters. The DiTMAO project aims at making this terminology accessible to several scientific communities, such as those of Romance and Semitic studies, as well as that of the history of medicine.

The corpus of DiTMAO

The textual basis of the lexicon consists of already edited medico-botanical texts in Latin and in Hebrew script and of manuscripts transcribed and edited over the course of the project. (a detailed list is available at /link/). Among the sources in Hebrew script, the most prominent text type are so-called synonym lists, which contain a large amount of Old Occitan medical and botanical terms in Hebrew characters with equivalents or explanations in other languages (also spelled in Hebrew characters), mostly in (Judaeo-)Arabic, but also in Hebrew, Latin, or other Romance languages and sometimes in Greek, Aramaic or Persian. These lists can be described as ancient multilingual dictionaries, which are of particular importance for Old Occitan lexicography for two main reasons:

  1. the synonym lists of the Jewish tradition include vernacular (Old Occitan) terms already from the 13th century on, hence these lists contain very early testimonies of Old Occitan technical terms.

  2. The corresponding terms in other ancient languages help to determine the meaning of otherwise opaque Old Occitan terms.

A special difficulty of medieval texts in vernacular languages is that most terms are documented in a large number of variants (reflecting different spellings, dialects, or historical stages of the languages at issue). Thus the dictionary will include all variants of Old Occitan terms, together with the corresponding terms in at least six other ancient languages. Whenever possible, also a translation in modern French and English will be provided. The dictionary contains about 5800 Old Occitan forms in Latin script and 3200 forms in Hebrew script. Furthermore, the corresponding terms in the other ancient languages amount to 3050 terms.

A sample entry

The ontological conception of Ditmao

Current trends in linguistic and lexical resources show a growing interest towards the publishing in the context of the Semantic Web. The sharing of lexica in accordance with linked data principles is, nowadays, mandatory: a resource (not only of linguistic nature) that cannot be accessed, shared and reused as a dataset is basically considered unreachable, and, thus, pretty much useless from a semantic web perspective. The lemon model has been developed as a standard for publishing lexica as RDF data. More precisely, lemon should be considered as an Ontology-Lexicon model for the Multilingual Semantic Web and its nature and purpose perfectly satisfy our needs of representing the DiTMAO lexicon and the relative ontologies. DiTMAO consists of three main domains:

  1. the lexicographic domain, including the lemmatized forms (lemma, variants and corresponding terms in other ancient languages) and their linguistic and lexicographic description.

  2. The conceptual domain, describing the meaning of each term by means of subontologies for the fields of botany, zoology, mineralogy, human anatomy, diseases and therapy (medication, medical instruments). We aim to complement the onomasiological description, if possible, with a modern scientific classification, for at least most of the plant names, and a medieval classification of plants and other simple drugs. The medieval classification follows the Galenic system of four basic body humors (blood, yellow bile, black bile and phlegm). The humors are associated with the two primary qualities by cross-combining the pairs HOT?COLD and DRY?WET. The simple drugs are classified by these quality pairs together with a certain degree of intensity, which varies from one to four (cf. [13]). In order to ensure that the categorization is in conformity with the classification used in medieval Southern France, we will only introduce the classification provided in the texts of our corpus.

  3. The documentation domain, giving the source for each form of a term and its meaning. The documentation is indispensable for a historical (diachronic) dictionary.

Over the course of the project the lemon model will be extended with a documentation domain and new vocabulary that is necessary for the lemmatization of a historical multilingual and multi-alphabetical dictionary. The full extension of the lemon model, together with all data (without copyright restrictions) will be published on the project web site.