03_processing:03_languages
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionNext revisionBoth sides next revision | ||
03_processing:03_languages [2022/01/04 10:27] – created Simone Ueberwasser | 03_processing:03_languages [2022/01/04 10:48] – Simone Ueberwasser | ||
---|---|---|---|
Line 27: | Line 27: | ||
This original SMS was tagged as follows: | This original SMS was tagged as follows: | ||
* Main language: German Standard (because most words are in German Standard) | * Main language: German Standard (because most words are in German Standard) | ||
- | * Borrowings: French (because Restaurant is originally French but can be found in the Duden | + | * Borrowings: French (because |
* Nonce Borrowings: | * Nonce Borrowings: | ||
- | * Spanish (neither travajo nor olla can be found in the Duden). In spite of the unorthodox spelling, we consider both words to be Spanish, because no other language used in this SMS provides similar phonological variants. | + | |
- | Italian (the following cannot be found in the Duden: fratello, come, stai, allora, amore, buona, giornata) | + | |
- | English (peace cannot be found in the Duden) | + | |
- | Swiss German dialect (luegsch, uf, di, gäll cannot be found in the Duden) | + | |
As can be seen from this example, Standard German and Swiss German Dialect are tagged independently, | As can be seen from this example, Standard German and Swiss German Dialect are tagged independently, | ||
- | Special tagging problems | + | =====Special tagging problems===== |
- | Swiss German dialect vs. Standard German | + | ====Swiss German dialect vs. Standard German==== |
Since there is no fixed spelling system nor codex for the Swiss German dialect, many words become homograph in Swiss German and in Standard German. The following rules have thus been fixed to assign a word to one or the other variety: | Since there is no fixed spelling system nor codex for the Swiss German dialect, many words become homograph in Swiss German and in Standard German. The following rules have thus been fixed to assign a word to one or the other variety: | ||
- | In case of homography, a word is considered to be the same as the main language. ich in a dialectal SMS is thus considered to be dialect, while ich in a non-dialectal SMS is considered to be Standard. | + | * In case of homography, a word is considered to be the same as the main language. |
- | If a seemingly dialectal word appears in a Standard SMS, the Variantenwörterbuch was consulted to differentiate between Standard Swiss German (' | + | |
- | If a word, that sound like Standard German appears in a Dialect SMS, the person tagging the SMS (all native speakers of Swiss German) asked herself | + | |
- | In some cases, it was not the individual word, but rather the word order that seemed to be Standard in an otherwise dialectal SMS. In this case we were very reluctant to register the Standard. It was only considered, if it is absolutely impossible to use the applied word-order in Standard Swiss German. | + | |
- | Dialectal words in any other SMS (i.e. SMS in Standard German and in languages other than German) are considered to be nonce borrowings rather than borrowings, because a Dialect word never occurs in a codex. | + | |
- | If a foreign word appears in a dialectal SMS, the Duden is used to find out, whether this word is a borrowing or a nonce borrowing. | + | |
- | In some cases, a word could not be found in the Duden, but we did not consider it to be Swiss German dialect either. These tokens were marked as Other German, meaning they are dialectal forms from Germany or Austria. The same goes for words that are to be found in the Duden but are marked as dialectal forms from Germany or Austria, e.g. the word Bussi (' | + | |
- | Internationalisms | + | ====Internationalisms==== |
- | Some words occur in different languages (e.g. OK or restaurant). Many of them derive from Greek or Latin, these two languages where thus ignored as donors. They were actually ignored, even if the word was only coined in the last two hundred years or so. Photograph, e.g. was coined by the inventors of the camera and could thus be considered to be German or French depending on the sources. However, since we care about the word only and not about the coining, that type of word creation was ignored altogether, instead, the word is considered to be Greek and thus ignored. | + | Some words occur in different languages (e.g. OK or //restaurant//). Many of them derive from Greek or Latin, these two languages where thus ignored as donors. They were actually ignored, even if the word was only coined in the last two hundred years or so. Photograph, e.g. was coined by the inventors of the camera and could thus be considered to be German or French depending on the sources. However, since we care about the word only and not about the coining, that type of word creation was ignored altogether, instead, the word is considered to be Greek and thus ignored. |
In all other cases, i.e. if the word is not a borrowing from Greek or Latin, the following rules were applied to define a word as a borrowing: | In all other cases, i.e. if the word is not a borrowing from Greek or Latin, the following rules were applied to define a word as a borrowing: | ||
- | Codices: If an internationalism is marked as e.g. being borrowed from English in the Duden, it was considered a borrowing. | + | * Codices: If an internationalism is marked as e.g. being borrowed from English in the Duden, it was considered a borrowing. |
- | Pronunciation: | + | |
- | Pseudo borrowings | + | ====Pseudo borrowings==== |
There are words in a language, that clearly sound as if they were a borrowing, but are not known in the apparent donor language. An example would be handy, the German word for 'cell phone' or alles paletti (~ ' | There are words in a language, that clearly sound as if they were a borrowing, but are not known in the apparent donor language. An example would be handy, the German word for 'cell phone' or alles paletti (~ ' | ||
- | Abbreviations | + | ====Abbreviations==== |
- | Abbreviations like tgif for 'thank god it's friday' | + | Abbreviations like //tgif// for 'thank god it's friday' |
- | On the other hand, abbreviations that could not be resolved such as iLSi were ignored. | + | On the other hand, abbreviations that could not be resolved such as //iLSi// were ignored. |
The following abbreviations were considered: | The following abbreviations were considered: | ||
- | Token Interpretation Language tagging based on: | + | ^Token^Interpretation^Language tagging based on:^ |
- | amt Amo te nonce borrowing: Portuguese | + | |amt|Amo te|nonce borrowing: Portuguese| |
- | bb bébé only considered in French SMS | + | |bb|bébé|only considered in French SMS| |
- | bjs beijinhos nonce borrowing: Portuguese | + | |bjs|beijinhos|nonce borrowing: Portuguese| |
- | btw by the way nonce borrowing English | + | |btw|by the way|nonce borrowing English| |
- | bmmw bist mir mega wichtig | + | |bmmw| |bist mir mega wichtig| |
- | bx bisous nonce borrowing French | + | |bx|bisous|nonce borrowing French| |
- | cu see you nonce borrowing English | + | |cu|see you|nonce borrowing English| |
- | fb Facebook ignored, | + | |fb|Facebook|ignored, since it's a proper name| |
- | Ga li gr Ganz liebe Grüsse / Ganz liebi Grüessli | + | |Ga li gr| |Ganz liebe Grüsse / Ganz liebi Grüessli| |
- | glg, GlG Ganz liebe Grüsse / Ganz liebi Grüessli | + | |glg, GlG| |Ganz liebe Grüsse / Ganz liebi Grüessli| |
- | Grz Greetings (> greetz) nonce borrowing English | + | |Grz|Greetings (> greetz)|nonce borrowing English| |
- | hdl Hab' dich lieb / Ha di lieb | + | |hdl| |Hab' dich lieb / Ha di lieb| |
- | IldvgHunvvvm buöIugsnz vdbddadddkukdggf, | + | |IldvgHunvvvm buöIugsnz vdbddadddkukdggf, |
- | ka keine Ahnung / Kei Ahnig | + | |ka| |keine Ahnung / Kei Ahnig| |
- | Kikoo coucou nonce borrowing French | + | |Kikoo|coucou|nonce borrowing French| |
- | ld liebe dich / lieb di | + | |ld| |liebe dich / lieb di| |
- | ldsmf4iue liebe dich so mega fest 4 immer und ewig nonce borrowing English | + | |ldsmf4iue|liebe dich so mega fest 4 immer und ewig|nonce borrowing English| |
- | lymtwcs love you more than words can say nonce borrowing English | + | |lymtwcs|love you more than words can say|nonce borrowing English| |
- | lysm Love you so much nonce borrowing English | + | |lysm|Love you so much|nonce borrowing English| |
- | mdr mot de rire nonce borrowing French | + | |mdr|mot de rire|nonce borrowing French| |
- | piz Peace nonce borrowing English | + | |piz|Peace|nonce borrowing English| |
- | t.b.c. to be continued / confirmed nonce borrowing English | + | |t.b.c.|to be continued / confirmed|nonce borrowing English| |
- | tgif thank god it's friday nonce borrowing English | + | |tgif|thank god it's friday|nonce borrowing English| |
- | tk(...) tausend Küsse (dein/e XY) / tuusig Küss | + | |tk(...)| |tausend Küsse (dein/e XY) / tuusig Küss| |
- | tqm Te quiero mucho nonce borrowing Spanish | + | |tqm|Te quiero mucho|nonce borrowing Spanish| |
- | tvtb ti voglio tanto bene nonce borrowing Italian | + | |tvtb|ti voglio tanto bene|nonce borrowing Italian| |
- | wdnv will dich nicht verlieren / will di nöd verlüre | + | |wdnv| |will dich nicht verlieren / will di nöd verlüre| |
- | we Wochenende / week-end No tagging, since it can be either language. | + | |we|Wochenende / week-end|No tagging, since it can be either language. |
Information used: | Information used: | ||
http:// | http:// |
03_processing/03_languages.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1