User Tools

Site Tools


03_processing:05_normalization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
03_processing:04_normalization [2022/01/04 13:30] Simone Ueberwasser03_processing:04_normalization [2022/01/04 13:32] Simone Ueberwasser
Line 56: Line 56:
  
 In German, nouns are spelled with a starting upper case letter. Independent of the capitalization in the SMS layer, nouns are in upper case in the normalized layer in an attempt to support a PoS tagger in recognizing nouns. In German, nouns are spelled with a starting upper case letter. Independent of the capitalization in the SMS layer, nouns are in upper case in the normalized layer in an attempt to support a PoS tagger in recognizing nouns.
-//Spelling//+====Spelling====
  
 Assuming that spelling is unorthodox in SMS all over, we decided to adjust spelling to what is found in a dictionary for a lemma on an according syntactical position. In German, e.g., there is an definite neutral article das and the conjunction dass (e.g. //er sagte, dass er komme// ('he said **that** he would come')). Irrespective of the spelling used in the SMS, we applied das for an article and dass for a conjunction. The same rule was applied for other homophonous words, too. Assuming that spelling is unorthodox in SMS all over, we decided to adjust spelling to what is found in a dictionary for a lemma on an according syntactical position. In German, e.g., there is an definite neutral article das and the conjunction dass (e.g. //er sagte, dass er komme// ('he said **that** he would come')). Irrespective of the spelling used in the SMS, we applied das for an article and dass for a conjunction. The same rule was applied for other homophonous words, too.
03_processing/05_normalization.txt · Last modified: 2022/06/27 09:21 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki