03_processing:05_normalization
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
03_processing:04_normalization [2022/01/04 13:30] – Simone Ueberwasser | 03_processing:05_normalization [2022/06/27 09:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 56: | Line 56: | ||
In German, nouns are spelled with a starting upper case letter. Independent of the capitalization in the SMS layer, nouns are in upper case in the normalized layer in an attempt to support a PoS tagger in recognizing nouns. | In German, nouns are spelled with a starting upper case letter. Independent of the capitalization in the SMS layer, nouns are in upper case in the normalized layer in an attempt to support a PoS tagger in recognizing nouns. | ||
- | //Spelling// | + | ====Spelling==== |
Assuming that spelling is unorthodox in SMS all over, we decided to adjust spelling to what is found in a dictionary for a lemma on an according syntactical position. In German, e.g., there is an definite neutral article das and the conjunction dass (e.g. //er sagte, dass er komme// ('he said **that** he would come' | Assuming that spelling is unorthodox in SMS all over, we decided to adjust spelling to what is found in a dictionary for a lemma on an according syntactical position. In German, e.g., there is an definite neutral article das and the conjunction dass (e.g. //er sagte, dass er komme// ('he said **that** he would come' | ||
Line 70: | Line 70: | ||
Digits were not modified, i.e. //3// remained //3// and //three// remained //three//. There is, however, one exception to this rule. Where digits were combined with letters, they were written out in the normalization, | Digits were not modified, i.e. //3// remained //3// and //three// remained //three//. There is, however, one exception to this rule. Where digits were combined with letters, they were written out in the normalization, | ||
- | ==== Special rules for Swiss German dialect ==== | + | ===== Special rules for Swiss German dialect |
- | === Helvetisms === | + | ==== Helvetisms |
Helvetisms, i.e. lemmas that belong to Standard German in Switzerland according to the [[https:// | Helvetisms, i.e. lemmas that belong to Standard German in Switzerland according to the [[https:// | ||
- | ===No equivalent in standard German=== | + | ====No equivalent in standard German==== |
Some words in Swiss German dialect do not have equivalents in Standard German, e.g //luege// ('to look') or //gumpe// ('to jump' | Some words in Swiss German dialect do not have equivalents in Standard German, e.g //luege// ('to look') or //gumpe// ('to jump' | ||
A special situation in this context is a verbal particle that can be realized as //go, ga, goge// and similar forms. This particle is syntactically compulsory in the dialect but has no equivalent in Standard German and is semantically empty. We decided to normalize this particle to //go// and to take it over into the normalized layer in this form. | A special situation in this context is a verbal particle that can be realized as //go, ga, goge// and similar forms. This particle is syntactically compulsory in the dialect but has no equivalent in Standard German and is semantically empty. We decided to normalize this particle to //go// and to take it over into the normalized layer in this form. | ||
- | === Prepositions === | + | ==== Prepositions |
Quite regularly, the Swiss German dialect does not use the same prepositions as Standard German. In this case, we used the same preposition in the normalized layer as in the SMS layer (albeit adjusted in spelling where needed). E.g. //i gane uf Bärn// ('I go to Bern' | Quite regularly, the Swiss German dialect does not use the same prepositions as Standard German. In this case, we used the same preposition in the normalized layer as in the SMS layer (albeit adjusted in spelling where needed). E.g. //i gane uf Bärn// ('I go to Bern' | ||
- | === Diminutives === | + | ==== Diminutives |
In Standard German a diminutive is normally realized as //-chen//, while the dialect only know a diminutive in //-li//. For some lemmas and in some (older) variants of German, a //-lein// diminutive exist(ed). Accordingly, | In Standard German a diminutive is normally realized as //-chen//, while the dialect only know a diminutive in //-li//. For some lemmas and in some (older) variants of German, a //-lein// diminutive exist(ed). Accordingly, | ||
- | === Imperatives === | + | ==== Imperatives |
In Standard German, the verb of an imperative can take a short or a long form: //schlaf gut// vs. //schlafe gut//. For the dialect, this is not the case, there is only a short form. Accordingly, | In Standard German, the verb of an imperative can take a short or a long form: //schlaf gut// vs. //schlafe gut//. For the dialect, this is not the case, there is only a short form. Accordingly, | ||
- | ==== Special rules for other languages ==== | + | ===== Special rules for other languages |
You find more information for languages other than German in the documentations written in the original language for {{ : | You find more information for languages other than German in the documentations written in the original language for {{ : |
03_processing/05_normalization.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1