03_processing:06_pos
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
03_processing:06_pos [2022/01/04 13:55] – created Simone Ueberwasser | 03_processing:06_pos [2022/06/27 09:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Part of speech tagging ====== | ====== Part of speech tagging ====== | ||
- | [[https:// | + | [[https:// |
- | In our corpus, the PoS annotation is applied to the layer pos in Annis. | + | |
- | German (both dialectal and non-dialectal) | + | |
- | PoS tagging was applied to the normalised level of each SMS, and each SMS was tagged as one unit. | + | ===== German (both dialectal |
- | TreeTagger was used, applying a tailor-made | + | ==== TreeTagger ==== |
- | The STTS tagset was used. | + | |
- | The tagger' | + | |
- | The tag PTKINF was added for infinitive particle (go, goge etc.) for the german dialect. | + | |
- | French | + | |
- | PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | + | * PoS tagging was applied to the normalised level of each SMS, and each SMS was tagged as one unit. |
- | TreeTagger was used out of the box. | + | * TreeTagger was used, applying a tailor-made German parameter file (courtesy of Helmut Schmid) |
- | Achim Stein' | + | * The [[http:// |
- | The tags DET:DEM and DET:IND were added. | + | * The tagger' |
- | Italian | + | * The tag PTKINF was added for infinitive particle (go, goge etc.) for the german dialect. |
+ | * The resulting sub-corora are: deu-tagged and gsw-tagged | ||
+ | |||
+ | ==== RFTagger ==== | ||
+ | * The same varieties of German were also tagged with the RFTagger, resulting in the sub-corpora deu-rftagged and gsw-rftagged | ||
+ | |||
+ | ===== French ===== | ||
+ | |||
+ | |||
+ | * PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | ||
+ | * TreeTagger was used out of the box. | ||
+ | | ||
+ | | ||
+ | ===== Italian | ||
+ | |||
+ | |||
+ | * PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | ||
+ | * TreeTagger was used out of the box. | ||
+ | * Achim Stein' | ||
+ | * The tag ADJ:poss was added. | ||
+ | ===== Precision ===== | ||
- | PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | ||
- | TreeTagger was used out of the box. | ||
- | Achim Stein' | ||
- | The tag ADJ:poss was added. | ||
- | Precision | ||
Our test gave the following precision for the respective sub-corpora: | Our test gave the following precision for the respective sub-corpora: | ||
- | gsw: 2'734 tokens checked: 96.3% correct | + | * gsw: 2'734 tokens checked: 96.3% correct |
- | deu: 2'922 tokens checked: 93.1% correct | + | |
- | fra: 3'133 tokens checked: 94.6% correct | + | |
- | ita: 2527 tokens checked: 90.5% correct | + | |
03_processing/06_pos.1641300953.txt.gz · Last modified: 2022/06/27 09:21 (external edit)