User Tools

Site Tools


start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
start [2022/01/04 07:24] Simone Ueberwasserstart [2022/01/05 16:19] – [Using the corpus] Simone Ueberwasser
Line 4: Line 4:
  
 ===== The corpus ===== ===== The corpus =====
-The Swiss SMS corpus consists of 25'947 SMS (~650'000 tokens), which were sent in by the Swiss public in 2009/2010. Of all SMS, 41% are in Swiss German (dialect), 28% in non-dialectal German, 18% in French, 6% in Italian, and 4% in Romansh. More information about the corpus can be found in the [[01_collection|documentation]].+The Swiss SMS corpus consists of 25'947 SMS (~650'000 tokens), which were sent in by the Swiss public in 2009/2010. Of all SMS, 41% are in Swiss German (dialect), 28% in non-dialectal German, 18% in French, 6% in Italian, and 4% in Romansh. More information about the corpus can be found in the section [[05_facts_and_figures|facts and figures]].
  
 ===== Using the corpus ===== ===== Using the corpus =====
Line 10: Line 10:
   * Not use the data for commercial use, i.e. only for bonafide research   * Not use the data for commercial use, i.e. only for bonafide research
   * Quote the source of the data as "Swiss SMS corpus" with the source as shown in the footer of this document and with a link to https://sms.linguistik.uzh.ch   * Quote the source of the data as "Swiss SMS corpus" with the source as shown in the footer of this document and with a link to https://sms.linguistik.uzh.ch
 +
 +If you need help browsing the corpus, please check the chapter [[02_browsing|Browsing]]
 +
 +Since the corpus is available on the same platform as the data from the sister-project [[https://www.whatsup-switzerland.ch/index.php/en/|What's up, Switzerland?]], please keep in mind that only the following sub-corpora contain SMS data, while the other sub-corpora are built up of WhatsApp messages. 
 +  * deu-rftagged: non-dialectal German data tagged with RF-Tagger
 +  * deu-tagged: non-dialectal German data tagged with TreeTagger
 +  * fra-tagged: French data tagged with TreeTagger
 +  * gsw-rftagged: Swiss German data where the normalized data was tagged with RF-Tagger
 +  * gsw-tagged: Swiss German data where the normalized data was tagged with TreeTagger
 +  * ita-tagged: Italian data taggend with TreeTagger
 +  * roh: Romansh data
 +
 +
 +For more information about the WhatsApp corpus, please consult the [[https://corpus.whatsup-switzerland.ch/index.php/en/|according documentation]].
  
 =====How to quote==== =====How to quote====
Line 18: Line 32:
 ====Quoting the corpus documentation==== ====Quoting the corpus documentation====
  
-Ueberwasser, Simone (2015): The Swiss SMS Corpus. Documentation, facts and figures. https://sms.linguistik.uzh.ch+Ueberwasser, Simone (2015/2022): The Swiss SMS Corpus. Documentation, facts and figures. https://sms.linguistik.uzh.ch
  
-More resources that document the creation of the corpus+More resources that document the creation of the corpus:
  
-Ruef, Beni/Ueberwasser, Simone (2013): The Taming of a Dialect: Interlinear Glossing of Swiss German Text Messages . In: Non-standard Data Sources in Corpus-based Research (ZSM-Studien 5). Aachen: Shaker, 61-68.+Ruef, Beni/Ueberwasser, Simone (2013): [[https://ueberwasser.eu/UeFiles/uni/Tagungen/2012Koeln/RuefUeberwasser.pdf|The Taming of a Dialect]]: Interlinear Glossing of Swiss German Text Messages . In: Non-standard Data Sources in Corpus-based Research (ZSM-Studien 5). Aachen: Shaker, 61-68.
  
 ====Publications that are based on the corpus==== ====Publications that are based on the corpus====
Line 28: Line 42:
 We have the full publication list on the [[https://p3.snf.ch/project-136230|SNSF research database P3]]. We have the full publication list on the [[https://p3.snf.ch/project-136230|SNSF research database P3]].
  
-====The original project==== 
  
-This corpus was originally created to be used in the project sms4science.ch funded by the Swiss National Science Foundation. 
  
 =====Acknowledgement===== =====Acknowledgement=====
start.txt · Last modified: 2022/09/12 19:18 by Stefan Bircher

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki