Legal information

Facts and figures: The SMS

There were two collections of SMS. The first produced 23'988 SMS and took place between end October 2009 and February 2010. The second collection was only advertised on the Italian and Romansh part of Switzerland in order to produce more SMS in those two languages. It produces another 1'959 SMS, mainly in the intended languages but also in others. This collection took place between end April 2011 and July 2011.

The total number of SMS to be found in our corpus thus comes to 25'947.

Some statistics

The average participant sent us approx. 15 SMS. The most SMS we received from a 39-year-old man, he sent us 358 SMS. A total of 15 participants sent us more than 100 SMS each.

Length of the sms

Some figures about the length of the SMS:

  • Shortest SMS: 1 character
  • Longest SMS: 2'374 characters
  • Average length of the SMS (=mean): 115 characters
  • Standard deviation: 84.61
  • Median: 104 characters

Distribution:

length.png

Tokens per SMS

The following average number of tokens can be found in our SMS:
  • Romansh: 68 tokens / SMS
  • Non-dialectal Italian: 26 tokens / SMS
  • Italian dialect: 22 tokens / SMS
  • Non-dialectal German: 24 tokens / SMS
  • German dialect: 26 tokens / SMS
  • Non-dialectal French: 26 tokens / SMS
  • French dialect: 33 tokens / SMS

Please consider our warnings about counting tokens in SMS before working with these figures.

On this page:
More statistics:
You might also be interested in:
Please don't forget to quote the corpus in your work.
Topic revision: r1 - 17 May 2015, SimoneUeberwasser
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.


The corpora and documentation are licensed under the <strong>Creative Commons license: Attribution + NoncommercialThe corpora and documentation are licensed under the *Creative Commons license: Attribution + Noncommercial:
- Licensees may copy, distribute, display, and perform the work and make derivative works based on it only for noncommercial purposes.
- Licensees may copy, distribute, display and publish the work and make derivative works based on it only if they give the author or licensor the credits as follows:
Stark, Elisabeth; Ueberwasser, Simone; Ruef, Beni (2009-2014). Swiss SMS Corpus. University of Zurich. https://sms.linguistik.uzh.ch

Ideas, requests, problems regarding sms4science? Send feedback