start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
start [2022/01/04 12:47] – [The corpus] Simone Ueberwasser | start [2022/09/12 19:18] (current) – Stefan Bircher | ||
---|---|---|---|
Line 4: | Line 4: | ||
===== The corpus ===== | ===== The corpus ===== | ||
- | The Swiss SMS corpus consists of 25'947 SMS (~650' | + | The Swiss SMS corpus consists of 25'947 SMS (~650' |
===== Using the corpus ===== | ===== Using the corpus ===== | ||
Line 10: | Line 10: | ||
* Not use the data for commercial use, i.e. only for bonafide research | * Not use the data for commercial use, i.e. only for bonafide research | ||
* Quote the source of the data as "Swiss SMS corpus" | * Quote the source of the data as "Swiss SMS corpus" | ||
+ | |||
+ | If you need help browsing the corpus, please check the chapter [[02_browsing|Browsing]]. | ||
+ | |||
+ | Since the corpus is available on the same platform as the data from the sister-project [[https:// | ||
+ | * deu-rftagged: | ||
+ | * deu-tagged: non-dialectal German data tagged with TreeTagger | ||
+ | * fra-tagged: French data tagged with TreeTagger | ||
+ | * gsw-rftagged: | ||
+ | * gsw-tagged: Swiss German data where the normalized data was tagged with TreeTagger | ||
+ | * ita-tagged: Italian data taggend with TreeTagger | ||
+ | * roh: Romansh data | ||
+ | |||
+ | |||
+ | For more information about the WhatsApp corpus, please consult the [[https:// | ||
=====How to quote==== | =====How to quote==== | ||
====Quoting the corpus==== | ====Quoting the corpus==== | ||
- | Stark, Elisabeth; Ueberwasser, | + | Stark, Elisabeth; Ueberwasser, |
====Quoting the corpus documentation==== | ====Quoting the corpus documentation==== | ||
- | Ueberwasser, | + | Ueberwasser, |
More resources that document the creation of the corpus: | More resources that document the creation of the corpus: |
start.1641296865.txt.gz · Last modified: 2022/06/27 09:21 (external edit)