Differences

This shows you the differences between two versions of the page.

--- start [2022/01/04 07:25] – Simone Ueberwasser
+++ start [2022/01/05 12:34] – ↷ Links adapted because of a move operation Simone Ueberwasser
@@ Line 4: / Line 4: @@
 ===== The corpus =====
-The Swiss SMS corpus consists of 25'947 SMS (~650'000 tokens), which were sent in by the Swiss public in 2009/2010. Of all SMS, 41% are in Swiss German (dialect), 28% in non-dialectal German, 18% in French, 6% in Italian, and 4% in Romansh. More information about the corpus can be found in the [[01_collection|documentation]].
+The Swiss SMS corpus consists of 25'947 SMS (~650'000 tokens), which were sent in by the Swiss public in 2009/2010. Of all SMS, 41% are in Swiss German (dialect), 28% in non-dialectal German, 18% in French, 6% in Italian, and 4% in Romansh. More information about the corpus can be found in the section [[05_facts_and_figures|facts and figures]].
 ===== Using the corpus =====
@@ Line 10: / Line 10: @@
   * Not use the data for commercial use, i.e. only for bonafide research
   * Quote the source of the data as "Swiss SMS corpus" with the source as shown in the footer of this document and with a link to https://sms.linguistik.uzh.ch
+Since the corpus is available on the same platform as the data from the sister-project [[https://www.whatsup-switzerland.ch/index.php/en/|What's up, Switzerland?]], please keep in mind that only the following sub-corpora contain SMS data, while the other sub-corpora are built up of WhatsApp messages.
+  * deu-rftagged: non-dialectal German data tagged with RF-Tagger
+  * deu-tagged: non-dialectal German data tagged with TreeTagger
+  * fra-tagged: French data tagged with TreeTagger
+  * gsw-rftagged: Swiss German data where the normalized data was tagged with RF-Tagger
+  * gsw-tagged: Swiss German data where the normalized data was tagged with TreeTagger
+  * ita-tagged: Italian data taggend with TreeTagger
+  * roh: Romansh data
+For more information about the WhatsApp corpus, please consult the [[https://corpus.whatsup-switzerland.ch/index.php/en/|according documentation]].
 =====How to quote====
@@ Line 18: / Line 30: @@
 ====Quoting the corpus documentation====
-Ueberwasser, Simone (2015): The Swiss SMS Corpus. Documentation, facts and figures. https://sms.linguistik.uzh.ch
+Ueberwasser, Simone (2015/2022): The Swiss SMS Corpus. Documentation, facts and figures. https://sms.linguistik.uzh.ch
 More resources that document the creation of the corpus:
-Ruef, Beni/Ueberwasser, Simone (2013): The Taming of a Dialect: Interlinear Glossing of Swiss German Text Messages . In: Non-standard Data Sources in Corpus-based Research (ZSM-Studien 5). Aachen: Shaker, 61-68.
+Ruef, Beni/Ueberwasser, Simone (2013): [[https://ueberwasser.eu/UeFiles/uni/Tagungen/2012Koeln/RuefUeberwasser.pdf|The Taming of a Dialect]]: Interlinear Glossing of Swiss German Text Messages . In: Non-standard Data Sources in Corpus-based Research (ZSM-Studien 5). Aachen: Shaker, 61-68.
 ====Publications that are based on the corpus====