The Latin subcorpus (summary) consists of 187 texts from the Merowingian period (398.149 tokens), dating from the 6th to the middle of the 8th century. It comprises saints’ legends, chronicles, charters, law codes, letters and formularies. The subcorpus PaLaFraLat is accessible under the terms of the CC BY-NC-SA 4.0 license.Documentation:
- Metadaten guidelines: More information on the structure of the corpus and recorded metadata in the Metadata Guidelines. All metadata are displayed in detail in the TXM-Portal (for access see below) and can be grouped for creating subcorpora.
- Lapos-Tagset: The corpus is entirely tagged for parts of speech and morpho-syntactic features using the PaLaFra tagset as well as the lapos tagset specially designed for the medieval Latin. More information on lapos in the tagset documentation.
- UD-pos-Tagset: More information on the structure of the common PaLaFra tagset and the differences in comparison to lapos in the tagset documentation.
- Annotation guidelines: See the annotation guidelines.
The French part of the PaLaFra corpus includes 42 Old French texts (1 054 000 words, summary). It was designed as a continuation of the Latin part, and the criteria of text selection and description are the same for both corpora. A priority in selecting texts for the French corpus was given to the oldest texts (dating before the 13th c.), non literary works and prose texts. The corpus is compiled on the basis of texts provided by the Base de Français Médiéval and is accessible under the terms of the CC BY-NC-SA 3.0 FR license.Documentation:
- Metadaten Guidelines: More information on the structure of the corpus and recorded metadata in the Manuel de descriptions des textes pour la BFM (Fr.). All metadata are displayed in detail in the TXM-Portal (for access see below) and can be grouped for creating subcorpora.
- Tagset: The corpus is entirely tagged for parts of speech and morpho-syntactic features using the PaLaFra tagset as well as the CATTEX2009 tagset specially designed for the medieval French. For more information on the tagset see the document Cattex2009-Tagset (Fr.). It is also entirely lemmatized using lemmata from the Dictionnaire du moyen français.
- Annotation Guidelines: See the French manual, the principles of annotation (Fr.) as well as the Manuel du corpus PaLaFraFro-V2-1 (Fr.).
The PaLaFraPar parallel corpus allows a very precise analysis of Latin to French transpositions. It contains until now four medieval French translations of Latin sources. The versions of texts in both languages are very close and are aligned at a section level. The texts will be morpho-syntactically tagged and lemmatized in both languages. The french texts are also part of the BFM2016 corpus and available under the terms of the CC BY-NC-SA 3.0 FR licence
Access to the texts (via TXM)
The open-source software TXM allows the access to the subcorpora PaLaFraFro-V2-1 and PaLaFraLat-V-2 and provides a qualitative analysis as well as a quantitative exploitation of the corpus. Based on Unicode and XML the cross-platform provides a graphical interface for the OS Windows, Linux and Mac OS X and a wide range of tools (like the query language CQL and statistical language R).
>>> The corpus is online available at the French TXM-Portal of the Base de Français Médiéval.