As shown above, you can query for individual tokens. But what if you need more detail, maybe from different layers? Let us consider the following examples:
Before looking at how to build those queries, let us describe their structure:
The queries for these examples are the following:
tok="io" & meta::sex="M"
. That reads as: a token with the contents io and the gender m. tok="was" & pos="PRELS" & #1 _=_ #2
. We can translate that as: the token has to be was and the PoS-Annotation has to be PRELS and the two annotations have to be found on the same token #1 _=_ #2. Please keep in mind that this is the syntax for subcorpora tagged with TreeTagger. The RFTagger uses a more precise annotation for relative pronouns, e.g. PRO.Rel.Subst.Nom.Sg.Neut. The query would thus look like: tok="was" & pos=/PRO.Rel.*/ & #1 _=_ #2
.gloss
or we could use the token. This choice depends on what we want to find. If we want to after the spelling est-ce que used by the informant, we query for tok=/…/
. If, on the other hand, we want to include unconventional spellings like sno, we have to use gloss=/…/
. Let us use the first option, which gives us the following query: pos="PRO:pers" & tok="sono" & #1 . #2
, which we can read as: a first token has to be a personal pronound and a second one has to be sono. The expression #1 . #2
means the first token has to directly precede the second one.That much for the examples. But how can you remember all of these options? You do not have to, since ANNIS offers you lots of support in creation the queries.