Differences

This shows you the differences between two versions of the page.

--- start [2022/01/03 17:04] – ↷ Links adapted because of a move operation Simone Ueberwasser
+++ start [2022/06/27 09:21] – external edit 127.0.0.1
@@ Line 4: / Line 4: @@
 ===== The corpus =====
-The Swiss SMS corpus consists of 25'947 SMS (~650'000 tokens), which were sent in by the Swiss public in 2009/2010. Of all SMS, 41% are in Swiss German (dialect), 28% in non-dialectal German, 18% in French, 6% in Italian, and 4% in Romansh. More information about the corpus can be found in the [[01_corpus:start|documentation]].
+The Swiss SMS corpus consists of 25'947 SMS (~650'000 tokens), which were sent in by the Swiss public in 2009/2010. Of all SMS, 41% are in Swiss German (dialect), 28% in non-dialectal German, 18% in French, 6% in Italian, and 4% in Romansh. More information about the corpus can be found in the section [[05_facts_and_figures|facts and figures]].
 ===== Using the corpus =====
@@ Line 11: / Line 11: @@
   * Quote the source of the data as "Swiss SMS corpus" with the source as shown in the footer of this document and with a link to https://sms.linguistik.uzh.ch
+If you need help browsing the corpus, please check the chapter [[02_browsing|Browsing]].
+Since the corpus is available on the same platform as the data from the sister-project [[https://www.whatsup-switzerland.ch/index.php/en/|What's up, Switzerland?]], please keep in mind that only the following sub-corpora contain SMS data, while the other sub-corpora are built up of WhatsApp messages.
+  * deu-rftagged: non-dialectal German data tagged with RF-Tagger
+  * deu-tagged: non-dialectal German data tagged with TreeTagger
+  * fra-tagged: French data tagged with TreeTagger
+  * gsw-rftagged: Swiss German data where the normalized data was tagged with RF-Tagger
+  * gsw-tagged: Swiss German data where the normalized data was tagged with TreeTagger
+  * ita-tagged: Italian data taggend with TreeTagger
+  * roh: Romansh data
+For more information about the WhatsApp corpus, please consult the [[https://corpus.whatsup-switzerland.ch/index.php/en/|according documentation]].
+=====How to quote====
+====Quoting the corpus====
+Stark, Elisabeth; Ueberwasser, Simone; Ruef, Beni (2009-2015). Swiss SMS Corpus. University of Zurich. www.sms4science.ch
+====Quoting the corpus documentation====
+Ueberwasser, Simone (2015/2022): The Swiss SMS Corpus. Documentation, facts and figures. www.sms4science.ch
+More resources that document the creation of the corpus:
+Ruef, Beni/Ueberwasser, Simone (2013): [[https://ueberwasser.eu/UeFiles/uni/Tagungen/2012Koeln/RuefUeberwasser.pdf|The Taming of a Dialect]]: Interlinear Glossing of Swiss German Text Messages . In: Non-standard Data Sources in Corpus-based Research (ZSM-Studien 5). Aachen: Shaker, 61-68.
+====Publications that are based on the corpus====
+We have the full publication list on the [[https://p3.snf.ch/project-136230|SNSF research database P3]].
+=====Acknowledgement=====
 This corpus would not be here without the following people/institutions, to whom we express our gratitude:
   * The Swiss National Science Foundation financed the project and the dissertations over four years.