User Tools

Site Tools


05_facts_and_figures:05_languages

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
05_facts_and_figures:05_languages [2022/01/04 16:22] – created Simone Ueberwasser05_facts_and_figures:05_languages [2022/06/27 09:21] (current) – external edit 127.0.0.1
Line 1: Line 1:
-====== Facts and Figures: Languages in the Corpus ======+====== Languages in the Corpus ======
 The following languages and varieties were annotated in the SMS. Please check our [[03_processing:03_languages|methodology]] for annotating languages to fully understand these figures. The following languages and varieties were annotated in the SMS. Please check our [[03_processing:03_languages|methodology]] for annotating languages to fully understand these figures.
  
  
-^Variety/language^Abbreviation^Number of SMS +| **Variety/language**  | **Abbreviation**  | **Number of SMS**  | 
-|German| +German                                         ^ 
-|Standard German|deu|7'287| +| Standard German   | deu           | 7'287          
-|Swiss German|gsw|10'706| +| Swiss German      | gsw           | 10'706         
-|Other German|gda|9| +| Other German      | gda           | 9              
-|French| +French                                         ^ 
-|Standard French|fra|4'619| +| Standard French   | fra           | 4'619          
-|French Patois|fsw|30| +| French Patois     | fsw           | 30             
-|Italian| +Italian           ^                              ^ 
-|Standard Italian|ita|1'471| +| Standard Italian  | ita           | 1'471          
-|Italian Dialect|isw|48| +| Italian Dialect   | isw           | 48             
-|Romansh| +Romansh           ^                              ^ 
-|Sursilvan|roh-sr|425| +| Sursilvan         | roh-sr        | 425            
-|Sutsilvan|roh-st|9| +| Sutsilvan         | roh-st        | 9              
-|Surmiran|roh-sm|110| +| Surmiran          | roh-sm        | 110            
-|Puter|roh-pt|181| +| Puter             | roh-pt        | 181            
-|Vallader|roh-vl|337| +| Vallader          | roh-vl        | 337            
-|Grischun|roh-gr|59| +| Grischun          | roh-gr        | 59             
-|Other languages| +Other languages   ^                              ^ 
-|English|eng|535| +| English           | eng           | 535            
-|Dutch|nld|5| +| Dutch             | nld           | 5              
-|North Germanic|gmn|3| +| North Germanic    | gmn           | 3              
-|Slavic|sla|42| +| Slavic            | sla           | 42             
-|Spanish|spa|43| +| Spanish           | spa           | 43             
-|Portuguese|por|5| +| Portuguese        | por           | 5              
-|Modern Greek|gre|3| +| Modern Greek      | gre           | 3              
-|Arabic|ara|1| +| Arabic            | ara           | 1              
-|Other|oth|106|+| Other             | oth           | 106            |
  
 Please keep in mind that one SMS can have more than one main language, so if you add those figures together, you will get more than 100%. As you can see, some languages were summarized. If we say that an SMS was written in North Germanic, it can be Danish, Norwegian or Swedish. Because the individual SMS are so short, they often contain words that are pronounced in a similar way in more than one of those languages and because of the unorthodox spelling in the SMS we cannot rely on spelling either when defining languages. We thus decided to pull these languages together. The same goes for Slavic languages. Please keep in mind that one SMS can have more than one main language, so if you add those figures together, you will get more than 100%. As you can see, some languages were summarized. If we say that an SMS was written in North Germanic, it can be Danish, Norwegian or Swedish. Because the individual SMS are so short, they often contain words that are pronounced in a similar way in more than one of those languages and because of the unorthodox spelling in the SMS we cannot rely on spelling either when defining languages. We thus decided to pull these languages together. The same goes for Slavic languages.
  
05_facts_and_figures/05_languages.1641309747.txt.gz · Last modified: 2022/06/27 09:21 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki