05_facts_and_figures:05_languages
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
05_facts_and_figures:05_languages [2022/01/04 16:22] – created Simone Ueberwasser | 05_facts_and_figures:05_languages [2022/06/27 09:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== Languages in the Corpus ====== |
The following languages and varieties were annotated in the SMS. Please check our [[03_processing: | The following languages and varieties were annotated in the SMS. Please check our [[03_processing: | ||
- | ^Variety/ | + | | **Variety/ |
- | |German| | + | ^ German |
- | |Standard German|deu|7' | + | | Standard German |
- | |Swiss German|gsw|10' | + | | Swiss German |
- | |Other German|gda|9| | + | | Other German |
- | |French| | + | ^ French |
- | |Standard French|fra|4' | + | | Standard French |
- | |French Patois|fsw|30| | + | | French Patois |
- | |Italian| | + | ^ Italian |
- | |Standard Italian|ita|1' | + | | Standard Italian |
- | |Italian Dialect|isw|48| | + | | Italian Dialect |
- | |Romansh| | + | ^ Romansh |
- | |Sursilvan|roh-sr|425| | + | | Sursilvan |
- | |Sutsilvan|roh-st|9| | + | | Sutsilvan |
- | |Surmiran|roh-sm|110| | + | | Surmiran |
- | |Puter|roh-pt|181| | + | | Puter |
- | |Vallader|roh-vl|337| | + | | Vallader |
- | |Grischun|roh-gr|59| | + | | Grischun |
- | |Other languages| | + | ^ Other languages |
- | |English|eng|535| | + | | English |
- | |Dutch|nld|5| | + | | Dutch |
- | |North Germanic|gmn|3| | + | | North Germanic |
- | |Slavic|sla|42| | + | | Slavic |
- | |Spanish|spa|43| | + | | Spanish |
- | |Portuguese|por|5| | + | | Portuguese |
- | |Modern Greek|gre|3| | + | | Modern Greek | gre |
- | |Arabic|ara|1| | + | | Arabic |
- | |Other|oth|106| | + | | Other |
Please keep in mind that one SMS can have more than one main language, so if you add those figures together, you will get more than 100%. As you can see, some languages were summarized. If we say that an SMS was written in North Germanic, it can be Danish, Norwegian or Swedish. Because the individual SMS are so short, they often contain words that are pronounced in a similar way in more than one of those languages and because of the unorthodox spelling in the SMS we cannot rely on spelling either when defining languages. We thus decided to pull these languages together. The same goes for Slavic languages. | Please keep in mind that one SMS can have more than one main language, so if you add those figures together, you will get more than 100%. As you can see, some languages were summarized. If we say that an SMS was written in North Germanic, it can be Danish, Norwegian or Swedish. Because the individual SMS are so short, they often contain words that are pronounced in a similar way in more than one of those languages and because of the unorthodox spelling in the SMS we cannot rely on spelling either when defining languages. We thus decided to pull these languages together. The same goes for Slavic languages. | ||
05_facts_and_figures/05_languages.1641309747.txt.gz · Last modified: 2022/06/27 09:21 (external edit)