- DATA COLLECTION
- BROWSING THE CORPUS
- THE QUESTIONNAIRE
- DATA PROCESSING
- FACTS AND FIGURES
This is an old revision of the document!
As shown above, you can query for individual tokens. But what if you need more detail, maybe from different layers? Let us consider the following examples:
Before looking at how to build those queries, let us describe their structure:
The queries for these examples are the following:
tok="io" & meta::sex="M". That reads as: a token with the contents io and the gender m.
tok="was" & pos="PRELS" & #1 _=_ #2. We can translate that as: the token has to be was and the PoS-Annotation has to be PRELS and the two annotations have to be found on the same token #1 _=_ #2. Please keep in mind that this is the syntax for subcorpora tagged with TreeTagger. The RFTagger uses a more precise annotation for relative pronouns, e.g. PRO.Rel.Subst.Nom.Sg.Neut. The query would thus look like:
tok="was" & pos=/PRO.Rel.*/ & #1 _=_ #2.
mftb_lem(the tagger used for French) or we could use the token. This choice depends on what we want to find. If we are after the spelling est-ce que used by the informant, we query for
tok=/…/. If, on the other hand, we want to include unconventional spellings like sq, we have to use
mftb_lem=/…/. Let us use the first option, which gives us the following query:
tok="est-ce" & tok="que" & #1 . #2, which we can read as: a first token est-ce and a second token que. The expression
#1 . #2means the first token has to directly precede the second one.
That much for the examples. But how can you remember all of these options? You do not have to, since ANNIS offers you lots of support in creation the queries.