The startpage is the entry point for every query. It consists of four parts:
Types of queriesYou can use the following methods to search for patterns in the corpus:
Entry fieldsType the token you are looking for into the entry. It will be treated differently depending on the query type selected. The search is not treated in the way Google treats a search. When searching for you too in Google will find you too but also you said that too. The SMS Navigator will only find occurrences of you too and ignore you said that too, because the Navigator does not recognize the two words you and too as individual expressions, instead the letter combination y-o-u-SPACE-t-o-o is considered as an entity and treated accordingly. If you want to see all SMS in the corpus, just leave the field blank and press _Start Query _. If you need more sophisticated search options like you in the same sentence as too, please consider a regex search.
Select Corpus & AnnotationsThe first two selections, Select Corpus and Annotations are only available for reasons of compatibility. No data can be selected here.
Corpus/SubcorpusTwo options are possible here. You can either select all SMS or only those for which demographic data is available.
Case SensitiveIf this option is set to no, as is the default for simple query and word query, the search is performed regardless of the case of the individual letters. Thus, you, You and YOU are all considered the same and will bring the same results The default for regex query is for case sensitivity to be set to yes, meaning you, You and YOU are three distinct expressions. A search for you will thus not find any occurrences of YOU etc. Of course case sensitivity can be set manually for each individual search in each query type.
Page SizeThe page size parameter defines, how many SMS are to be displayed on each screen. Keeping this value small will improve the overall perspective, while a large value will allow for better search functionality within the result view.
LanguageEver SMS has been treated with three different language taggings. All selections offered here react as and searches, i.e. if you select Swiss German as a main language and English as a borrowing language and Romansch as nonce borrowing language, you will only find SMS that actually fulfill all three conditions. In the example given here, you will most likely get no results. These selections, of course, are an addition to what you entered in the entry field, so the system will look for SMS that fulfill your language selection but also contain the search string you entered in the entry field. For the main language, you can also select multilingual SMS, i.e. SMS with more than one main language. Very often, these are rather short SMS, like "yes, gut.". The two other language tagging, i.e. borrowings and nonce borrowings offer the additional options to chose SMS with any language annotation or with none of them.
Version HistoryThis corpus is continuously being developed further. Thus, the data within the corpus can change from time to time, especially the tagging of the individual languages. When writing papers about the corpus, please always quote the corpus version date on the startpage to make clear which version of the corpus you base your study on.
On this page:
You might also be interested in:
Please don't forget to quote the corpus in your work.