User Tools

Site Tools


02_browsing:01_sub_corpora

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
02_browsing:01_sub_corpora [2022/01/05 17:30]
Simone Ueberwasser created
02_browsing:01_sub_corpora [2022/06/27 09:21] (current)
Line 1: Line 1:
 ====== Sub-corpora ====== ====== Sub-corpora ======
-The following sub-corpora are available:+The corpus all-tagged contains all SMS in all languages. Data for all languages except Romansh are tagged with TreeTagger. 
 + 
 +Next to that, the following sub-corpora per language are available:
   * deu-rftagged: non-dialectal German data tagged with RF-Tagger   * deu-rftagged: non-dialectal German data tagged with RF-Tagger
   * deu-tagged: non-dialectal German data tagged with TreeTagger   * deu-tagged: non-dialectal German data tagged with TreeTagger
Line 25: Line 27:
   * If you need specific information about an individual chat, you can select the SMS instead of the sub-corpus in the top left to get information such as languages contained, demographic information, etc. This is also an easy way to see which SMS are integrated in this sub-corpus.   * If you need specific information about an individual chat, you can select the SMS instead of the sub-corpus in the top left to get information such as languages contained, demographic information, etc. This is also an easy way to see which SMS are integrated in this sub-corpus.
  
-{{ :02_browsing:annotations.png?400 |}} 
-Figure 1: Information about a (sub-)corpus 
  
 On the right-hand side of the information window, you see which annotations are available to be queried for the selected sub-corpus. On the right-hand side of the information window, you see which annotations are available to be queried for the selected sub-corpus.
02_browsing/01_sub_corpora.1641400210.txt.gz · Last modified: 2022/06/27 09:21 (external edit)