start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
start [2022/01/04 12:47] – [The corpus] Simone Ueberwasser | start [2022/01/05 16:19] – [Using the corpus] Simone Ueberwasser | ||
---|---|---|---|
Line 4: | Line 4: | ||
===== The corpus ===== | ===== The corpus ===== | ||
- | The Swiss SMS corpus consists of 25'947 SMS (~650' | + | The Swiss SMS corpus consists of 25'947 SMS (~650' |
===== Using the corpus ===== | ===== Using the corpus ===== | ||
Line 10: | Line 10: | ||
* Not use the data for commercial use, i.e. only for bonafide research | * Not use the data for commercial use, i.e. only for bonafide research | ||
* Quote the source of the data as "Swiss SMS corpus" | * Quote the source of the data as "Swiss SMS corpus" | ||
+ | |||
+ | If you need help browsing the corpus, please check the chapter [[02_browsing|Browsing]] | ||
+ | |||
+ | Since the corpus is available on the same platform as the data from the sister-project [[https:// | ||
+ | * deu-rftagged: | ||
+ | * deu-tagged: non-dialectal German data tagged with TreeTagger | ||
+ | * fra-tagged: French data tagged with TreeTagger | ||
+ | * gsw-rftagged: | ||
+ | * gsw-tagged: Swiss German data where the normalized data was tagged with TreeTagger | ||
+ | * ita-tagged: Italian data taggend with TreeTagger | ||
+ | * roh: Romansh data | ||
+ | |||
+ | |||
+ | For more information about the WhatsApp corpus, please consult the [[https:// | ||
=====How to quote==== | =====How to quote==== |
start.txt · Last modified: 2022/09/12 19:18 by Stefan Bircher