User Tools

Site Tools


05_facts_and_figures:05_languages

This is an old revision of the document!


Facts and Figures: Languages in the Corpus

The following languages and varieties were annotated in the SMS. Please check our methodology for annotating languages to fully understand these figures.

Variety/languageAbbreviation
German
Standard Germandeu7'287
Swiss Germangsw10'706
Other Germangda9
French
Standard Frenchfra4'619
French Patoisfsw30
Italian
Standard Italianita1'471
Italian Dialectisw48
Romansh
Sursilvanroh-sr425
Sutsilvanroh-st9
Surmiranroh-sm110
Puterroh-pt181
Valladerroh-vl337
Grischunroh-gr59
Other languages
Englisheng535
Dutchnld5
North Germanicgmn3
Slavicsla42
Spanishspa43
Portuguesepor5
Modern Greekgre3
Arabicara1
Otheroth106

Please keep in mind that one SMS can have more than one main language, so if you add those figures together, you will get more than 100%. As you can see, some languages were summarized. If we say that an SMS was written in North Germanic, it can be Danish, Norwegian or Swedish. Because the individual SMS are so short, they often contain words that are pronounced in a similar way in more than one of those languages and because of the unorthodox spelling in the SMS we cannot rely on spelling either when defining languages. We thus decided to pull these languages together. The same goes for Slavic languages.

05_facts_and_figures/05_languages.1641309747.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki