Texts

The Latin subcorpus (summary) consists of sources from the Merowingian period, dating from the 6th to the middle of the 8th century. It comprises saints’ legends, chronicles, charters, law codes, letters and formularies.

The French part of the PaLaFra corpus includes 42 Old French texts (1 054 000 words, summary). It was designed as a continuation of the Latin part, and the criteria of text selection and description are the same for both corpora. A priority in selecting texts for the French corpus was given to the oldest texts (dating before the 13th c.), non literary works and prose texts. It is entirely tagged for parts of speech and morpho-syntactic features using the PaLaFra tagset as well as the Catex 2009 specially designed for the medieval French. It is also entirely lemmatized using lemmata from the Dictrionnaire du moyen français. The corpus is compiled on the basis of texts provided by the Base de Français Médiéval and is freely accessible under the terms of the CC BY-NC-SA 3.0 FR license.

The PaLaFraPar parallel corpus allows a very precise analysis of Latin to French transpositions. It contains a small number of medieval French translations of Latin sources. The versions of texts in both languages are very close and are aligned at a sentence level. The texts are morpho-syntactically tagged and lemmatized in both languages.