This is an old revision of the document!
The Swiss SMS corpus is one of the results of a project funded by the Swiss National Science Foundation between 2011 and 2014 and directed by Prof. Elisabeth Stark. Other results of the project are six dissertations as well as an abundance of student papers and publications. An overview over the whole project with can also be found in the SNSF research database P3.
The Swiss SMS corpus consists of 25'947 SMS (~650'000 tokens), which were sent in by the Swiss public in 2009/2010. Of all SMS, 41% are in Swiss German (dialect), 28% in non-dialectal German, 18% in French, 6% in Italian, and 4% in Romansh. More information about the corpus can be found in the documentation.
These data are freely available for bonafide academic research (CC-NY-NC), but not for commercial use. If you use the corpus, you agree to our conditions, i.e. to:
Stark, Elisabeth; Ueberwasser, Simone; Ruef, Beni (2009-2015). Swiss SMS Corpus. University of Zurich. https://sms.linguistik.uzh.ch
Ueberwasser, Simone (2015): The Swiss SMS Corpus. Documentation, facts and figures. https://sms.linguistik.uzh.ch
More resources that document the creation of the corpus
Ruef, Beni/Ueberwasser, Simone (2013): The Taming of a Dialect: Interlinear Glossing of Swiss German Text Messages . In: Non-standard Data Sources in Corpus-based Research (ZSM-Studien 5). Aachen: Shaker, 61-68.
We have the full publication list on the SNSF research database P3.
This corpus was originally created to be used in the project sms4science.ch funded by the Swiss National Science Foundation.
This corpus would not be here without the following people/institutions, to whom we express our gratitude: