03_processing:06_pos
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| 03_processing:06_pos [2022/01/04 12:55] – created simone.ueberwasser.ds.uzh.ch | 03_processing:06_pos [2022/06/27 07:21] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Part of speech tagging ====== | ====== Part of speech tagging ====== | ||
| - | [[https:// | + | [[https:// |
| - | In our corpus, the PoS annotation is applied to the layer pos in Annis. | + | |
| - | German (both dialectal and non-dialectal) | + | |
| - | PoS tagging was applied to the normalised level of each SMS, and each SMS was tagged as one unit. | + | ===== German (both dialectal |
| - | TreeTagger was used, applying a tailor-made | + | ==== TreeTagger ==== |
| - | The STTS tagset was used. | + | |
| - | The tagger' | + | |
| - | The tag PTKINF was added for infinitive particle (go, goge etc.) for the german dialect. | + | |
| - | French | + | |
| - | PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | + | * PoS tagging was applied to the normalised level of each SMS, and each SMS was tagged as one unit. |
| - | TreeTagger was used out of the box. | + | * TreeTagger was used, applying a tailor-made German parameter file (courtesy of Helmut Schmid) |
| - | Achim Stein' | + | * The [[http:// |
| - | The tags DET:DEM and DET:IND were added. | + | * The tagger' |
| - | Italian | + | * The tag PTKINF was added for infinitive particle (go, goge etc.) for the german dialect. |
| + | * The resulting sub-corora are: deu-tagged and gsw-tagged | ||
| + | |||
| + | ==== RFTagger ==== | ||
| + | * The same varieties of German were also tagged with the RFTagger, resulting in the sub-corpora deu-rftagged and gsw-rftagged | ||
| + | |||
| + | ===== French ===== | ||
| + | |||
| + | |||
| + | * PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | ||
| + | * TreeTagger was used out of the box. | ||
| + | | ||
| + | | ||
| + | ===== Italian | ||
| + | |||
| + | |||
| + | * PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | ||
| + | * TreeTagger was used out of the box. | ||
| + | * Achim Stein' | ||
| + | * The tag ADJ:poss was added. | ||
| + | ===== Precision ===== | ||
| - | PoS tagging was applied to the normalised level of each SMS, and SMS was tagged as one unit. | ||
| - | TreeTagger was used out of the box. | ||
| - | Achim Stein' | ||
| - | The tag ADJ:poss was added. | ||
| - | Precision | ||
| Our test gave the following precision for the respective sub-corpora: | Our test gave the following precision for the respective sub-corpora: | ||
| - | gsw: 2'734 tokens checked: 96.3% correct | + | * gsw: 2'734 tokens checked: 96.3% correct |
| - | deu: 2'922 tokens checked: 93.1% correct | + | |
| - | fra: 3'133 tokens checked: 94.6% correct | + | |
| - | ita: 2527 tokens checked: 90.5% correct | + | |
03_processing/06_pos.1641300953.txt.gz · Last modified: (external edit)
