The query module#
Query the tree using the query property of Context and Token.
Normally you need not to import this module in order to use it. Using the
query
property of any Token or (in most cases) Context, you
can query the token tree to find tokens and contexts, based on lexicons and/or
actions and text contents. You can chain calls in an XPath-like fashion.
This module supplements the various find_xxx methods of every Context object. A query is a generator, starts at the query property of a Context or Token object, and initially yields just that object.
You can navigate using children, all, first, last, [n], [n:n], [n:n:n], next, previous, right, left, right_siblings, left_siblings, map(), parent and ancestors. Use uniq to remove double occurrences of nodes, which can e.g. happen when navigating to the parent of all nodes.
You can narrow down the search using tokens, contexts, remove_ancestors, remove_descendants, slice() and filter().
You can search for tokens using (‘text’) or (lexicon), startingwith(), endingwith(), containing(), matching(), action() or in_action(). The special prefix is_not inverts the query, so query.is_not.containing(“bla”) yields Tokens that do not contain the text “bla”.
Examples:
Find all tokens that are the first child of a Context with bla lexicon:
root.query.all(MyLang.bla)[0]
Find (in Xml) all attributes with name ‘name’ that are in a <bla> tag:
root.query.all.action(Name.Tag)("bla").next('name')
Find all tags containing “hi” in their text nodes:
root.query.all.action(Name.Tag).next.next.action(Text).containing('hi')
Find all comments that have TODO in it:
root.query.all.action(Comment).containing('TODO')
Find all “\version” tokens in the root context, that have a “2” in the version string after it:
(t for t in root.query.children('\\version')
if t.query.next.next.containing('2'))
Which could also be written as:
root.query.children('\\version').filter(
lambda t: t.query.next.next.containing('2'))
A query is a generator, you can iterate over the results:
for attrs in q.all.action(Name.Tag)('origin').right:
for atr in attrs.query.action(Name.Attribute):
print(atr)
For debugging purposes, there are also the list()
construct, and the
pick()
, count()
and dump()
methods:
root.query.all.action(Name.Tag)("img").count() # number of "img" tags
list(root.query.all.action(Name.Tag)("img")) # list of all "img" tag name tokens
Note that a (partial) query can be reused, it simply restarts the iteration over the results. The above could also be written as:
q = root.query.all.action(Name.Tag)("img")
q.count() # number of "img" tags
list(q) # list of all "img" tag name tokens
A query resolves to False if there is no single result:
if token.query.ancestors(LilyPond.header):
do_something() # the token is a descendant of a LilyPond.header context
You can also directly instantiate a Query object for a list of nodes, if you want to query those in one go:
q = Query.from_nodes(nodes)
Summary of the query methods:#
Endpoint methods (some are mainly for debugging):
ls
,
count()
,
dump()
,
pick()
,
pick_last()
,
range()
and
delete()
.
Selecting (filtering) nodes:#
These methods filter out current nodes without adding new nodes to the selection:
tokens
,
contexts
,
uniq
,
remove_ancestors
,
remove_descendants
,
slice()
and
filter()
.
The special is_not
operator inverts the meaning of the
next query, e.g.:
n.query.all.is_not.startingwith("text")
The following query methods can be inverted by prepending is_not:
len()
,
in_range()
,
(lexicon)
,
(lexicon, lexicon2, ...)
,
("text")
,
("text", "text2", ...)
,
startingwith()
,
endingwith()
,
containing()
,
matching()
,
action()
and
in_action()
.
There is a subtle difference between action and in_action: with the first, the action should exactly match, with the latter the tokens are selected when the action exactly matches, or is a descendant of the given action.
- class Query(gen, invert=False)[source]#
Bases:
object
A Query navigates and filters a node tree.
A Query is instantiated either by calling
Token.query
orContext.query
, or by callingQuery.from_nodes()
on a list of nodes (tokens and/or contexts).- property ls#
List current selection of this Query, for debugging purposes.
- range()[source]#
Return the text range as a tuple (pos, end).
The
pos
is the lowest pos of the nodes in the current set, andend
is the highest end of the nodes. If the result set is empty, (-1, -1) is returned.
- delete()[source]#
Delete all selected nodes from their parents.
Internally calls
uniq
andremove_descendants
, so that no unnecessary deletes are done. If a context would become empty, that context itself is deleted instead of all its children (except for the root of course). Returns the number of nodes that were deleted.Note
If you delete tokens from a tree which belong to a group, the tree cannot reliably be used by a treebuilder for a partial rebuild.
- __getitem__(key)[source]#
Get the specified item or items of every context node.
Note that the result nodes always form a flat iterable. No IndexError will be raised if an index would be out of range for any node.
- property children#
All direct children of the current nodes.
- property all#
All descendants, contexts and their nodes.
- property alltokens#
Shortcut for all.tokens.
- property allcontexts#
Shortcut for all.contexts.
- property parent#
Yield the parent of every node.
This can lead to many double occurrences of the same node in the result set; use
uniq
to fix that.
- property ancestors#
Yield the ancestor contexts of every node.
- property first#
Yield the first node of every context node, same as [0].
- property last#
Yield the last node of every context node, same as [-1].
- property next#
Yield the next token, if any.
- property previous#
Yield the previous token, if any.
- property forward#
Yield Tokens in forward direction.
- property backward#
Yield Tokens in backward direction.
- property right#
Yield the right sibling, if any.
- property left#
Yield the left sibling, if any.
- property right_siblings#
Yield the right siblings, if any.
- property left_siblings#
Yield the left siblings, if any.
- map(function)[source]#
Call the function on every node and yield its results, which should be zero or more nodes as well.
- filter(predicate)[source]#
Yield nodes for which the predicate returns a value that evaluates to True.
- property tokens#
Get only the tokens.
- property contexts#
Get only the contexts.
- property uniq#
Remove double occurrences of the same node from the result set.
This can happen e.g. when you find the parent of multiple nodes.
- slice(*args)[source]#
Slice the full result set, using
itertools.islice()
.This can help narrowing down the result set. For example:
root.query.all("blaat").slice(1).right_siblings.slice(3) ...
will continue the query with only the first occurrence of a token “blaat”, and then look for at most three right siblings. If the slice(1) were not there, all the right siblings would become one large result set because you wouldn’t know how many tokens “blaat” were matched.
- property remove_descendants#
Remove nodes that have ancestors in the current node list.
- property remove_ancestors#
Remove nodes that have descendants in the current node list.
- property is_not#
Invert the next query.
- len(min_length, max_length=None)[source]#
Only yield contexts, with min_length, or with length between min and max.
- in_range(start=0, end=None)[source]#
Yield a restricted set, tokens and/or contexts must fall in start→end
- __call__(*what)[source]#
Yield token if token has that text, or context if context has that lexicon.
You can even mix the types if you’d need to:
for n in tree.query.all("%", Lang.comment): # do something
yields tokens that are a percent sign and contexts that have the Lang.comment lexicon.
- matching(pattern, flags=0)[source]#
Yield tokens matching the regular expression.
re.search()
is used, so the expression can match anywhere unless you use ^ or $ characters).