start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
start [2022/01/04 07:25] – Simone Ueberwasser | start [2022/01/05 12:34] – ↷ Links adapted because of a move operation Simone Ueberwasser | ||
---|---|---|---|
Line 4: | Line 4: | ||
===== The corpus ===== | ===== The corpus ===== | ||
- | The Swiss SMS corpus consists of 25'947 SMS (~650' | + | The Swiss SMS corpus consists of 25'947 SMS (~650' |
===== Using the corpus ===== | ===== Using the corpus ===== | ||
Line 10: | Line 10: | ||
* Not use the data for commercial use, i.e. only for bonafide research | * Not use the data for commercial use, i.e. only for bonafide research | ||
* Quote the source of the data as "Swiss SMS corpus" | * Quote the source of the data as "Swiss SMS corpus" | ||
+ | |||
+ | Since the corpus is available on the same platform as the data from the sister-project [[https:// | ||
+ | * deu-rftagged: | ||
+ | * deu-tagged: non-dialectal German data tagged with TreeTagger | ||
+ | * fra-tagged: French data tagged with TreeTagger | ||
+ | * gsw-rftagged: | ||
+ | * gsw-tagged: Swiss German data where the normalized data was tagged with TreeTagger | ||
+ | * ita-tagged: Italian data taggend with TreeTagger | ||
+ | * roh: Romansh data | ||
+ | |||
+ | |||
+ | For more information about the WhatsApp corpus, please consult the [[https:// | ||
=====How to quote==== | =====How to quote==== | ||
Line 18: | Line 30: | ||
====Quoting the corpus documentation==== | ====Quoting the corpus documentation==== | ||
- | Ueberwasser, | + | Ueberwasser, |
More resources that document the creation of the corpus: | More resources that document the creation of the corpus: | ||
- | Ruef, Beni/ | + | Ruef, Beni/ |
====Publications that are based on the corpus==== | ====Publications that are based on the corpus==== |
start.txt · Last modified: 2022/09/12 19:18 by Stefan Bircher