Legal information

Facts and figures: Languages in the SMS

The following languages and varieties were tagged in the SMS:

Variety/language Abbreviation Number of SMS
German
Standard German deu 7'287
Swiss German gsw 10'706
Other German gda 9
French
Standard French fra 4'619
French Patois fsw 30
Italian
Standard Italian ita 1'471
Italian Dialect isw 48
Romansh
Sursilvan roh-sr 425
Sutsilvan roh-st 9
Surmiran roh-sm 110
Puter roh-pt 181
Vallader roh-vl 337
Grischun roh-gr 59
Other languages
English eng 535
Dutch nld 5
North Germanic gmn 3
Slavic sla 42
Spanish spa 43
Portuguese por 5
Modern Greek gre 3
Arabic ara 1
Other oth 106

Please keep in mind that one SMS can have more than one main language, so if you add those figures together, you will get more than 100%. As you can see, some languages were summarized. If we say that an SMS was written in North Germanic, it can be Danish, Norwegian or Swedish. Because the individual SMS are so short, they often contain words that are pronounced in a similar way in more than one of those languages and because of the unorthodox spelling in the SMS we cannot rely on spelling either when defining languages. We thus decided to pull these languages together. The same goes for Slavic languages.

More statistics:
You might also be interested in:
Please don't forget to quote the corpus in your work.
Topic revision: r1 - 02 May 2015, SimoneUeberwasser
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.


The corpora and documentation are licensed under the <strong>Creative Commons license: Attribution + NoncommercialThe corpora and documentation are licensed under the *Creative Commons license: Attribution + Noncommercial:
- Licensees may copy, distribute, display, and perform the work and make derivative works based on it only for noncommercial purposes.
- Licensees may copy, distribute, display and publish the work and make derivative works based on it only if they give the author or licensor the credits as follows:
Stark, Elisabeth; Ueberwasser, Simone; Ruef, Beni (2009-2014). Swiss SMS Corpus. University of Zurich. https://sms.linguistik.uzh.ch

Ideas, requests, problems regarding sms4science? Send feedback