Table of Contents

Part of speech tagging

Wikipedia defines PoS tagging as follows: "In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context, i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. " In this corpus, we applied PoS tagging to the German, French and Italian parts using Helmut Schmid's TreeTagger For both varieties of German (i.e. dialectal and non dialectal), there is also a sub-corpus available that was annotated with the RFTagger. For Romansh, unfortunately, there is no parameter file available for TreeTagger and there are in fact no other tools available for this language, either.

German (both dialectal and non-dialectal)






Our test gave the following precision for the respective sub-corpora: