Differences

This shows you the differences between two versions of the page.

--- 05_facts_and_figures:01_corpus [2022/01/04 15:33] – [Tokens per language] Simone Ueberwasser
+++ 05_facts_and_figures:01_corpus [2022/06/27 09:21] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
-====== Facts and Figures: the Corpus ======
+====== The Corpus ======
 The corpus consists of roughly 500'000 tokens. However, counting tokens in a corpus with that many emoticons and other special characters as well as with a spelling that deviates greatly from the norm is nearly impossible. There is e.g. one participants who does not use any spaces in his SMS. His SMS consequently get counted as one single token. Thus, the figure has to be seen as an approximation.
 ===== Number of characters =====