The query module#

Query the tree using the query property of Context and Token.

Normally you need not to import this module in order to use it. Using the query property of any Token or (in most cases) Context, you can query the token tree to find tokens and contexts, based on lexicons and/or actions and text contents. You can chain calls in an XPath-like fashion.

This module supplements the various find_xxx methods of every Context object. A query is a generator, starts at the query property of a Context or Token object, and initially yields just that object.

You can navigate using children, all, first, last, [n], [n:n], [n:n:n], next, previous, right, left, right_siblings, left_siblings, map(), parent and ancestors. Use uniq to remove double occurrences of nodes, which can e.g. happen when navigating to the parent of all nodes.

You can narrow down the search using tokens, contexts, remove_ancestors, remove_descendants, slice() and filter().

You can search for tokens using (‘text’) or (lexicon), startingwith(), endingwith(), containing(), matching(), action() or in_action(). The special prefix is_not inverts the query, so query.is_not.containing(“bla”) yields Tokens that do not contain the text “bla”.

Examples:

Find all tokens that are the first child of a Context with bla lexicon:

root.query.all(MyLang.bla)[0]

Find (in Xml) all attributes with name ‘name’ that are in a <bla> tag:

root.query.all.action(Name.Tag)("bla").next('name')

Find all tags containing “hi” in their text nodes:

root.query.all.action(Name.Tag).next.next.action(Text).containing('hi')

Find all comments that have TODO in it:

root.query.all.action(Comment).containing('TODO')

Find all “\version” tokens in the root context, that have a “2” in the version string after it:

(t for t in root.query.children('\\version')
    if t.query.next.next.containing('2'))

Which could also be written as:

root.query.children('\\version').filter(
    lambda t: t.query.next.next.containing('2'))

A query is a generator, you can iterate over the results:

for attrs in q.all.action(Name.Tag)('origin').right:
    for atr in attrs.query.action(Name.Attribute):
        print(atr)

For debugging purposes, there are also the list() construct, and the pick(), count() and dump() methods:

root.query.all.action(Name.Tag)("img").count() # number of "img" tags
list(root.query.all.action(Name.Tag)("img"))   # list of all "img" tag name tokens

Note that a (partial) query can be reused, it simply restarts the iteration over the results. The above could also be written as:

q = root.query.all.action(Name.Tag)("img")
q.count()   # number of "img" tags
list(q)     # list of all "img" tag name tokens

A query resolves to False if there is no single result:

if token.query.ancestors(LilyPond.header):
    do_something() # the token is a descendant of a LilyPond.header context

You can also directly instantiate a Query object for a list of nodes, if you want to query those in one go:

q = Query.from_nodes(nodes)

Summary of the query methods:#

Endpoint methods (some are mainly for debugging):

ls, count(), dump(), pick(), pick_last(), range() and delete().

Selecting (filtering) nodes:#

These methods filter out current nodes without adding new nodes to the selection:

tokens, contexts, uniq, remove_ancestors, remove_descendants, slice() and filter().

The special is_not operator inverts the meaning of the next query, e.g.:

n.query.all.is_not.startingwith("text")

The following query methods can be inverted by prepending is_not:

len(), in_range(), (lexicon), (lexicon, lexicon2, ...), ("text"), ("text", "text2", ...), startingwith(), endingwith(), containing(), matching(), action() and in_action().

There is a subtle difference between action and in_action: with the first, the action should exactly match, with the latter the tokens are selected when the action exactly matches, or is a descendant of the given action.

class Query(gen, invert=False)[source]#

Bases: object

A Query navigates and filters a node tree.

A Query is instantiated either by calling Token.query or Context.query, or by calling Query.from_nodes() on a list of nodes (tokens and/or contexts).

classmethod from_nodes(nodes)[source]#

Create a Query object querying a list of nodes in one go.

__bool__()[source]#

Return True if there is at least one result.

property ls#

List current selection of this Query, for debugging purposes.

count()[source]#

Compute the length of the iterable.

dump(file=None, style=None)[source]#

Dump all selected nodes to the console (or to file).

See also

tree.Node.dump()

pick(default=None)[source]#

Pick the first value, or return the default.

pick_last(default=None)[source]#

Pick the last value, or return the default.

range()[source]#

Return the text range as a tuple (pos, end).

The pos is the lowest pos of the nodes in the current set, and end is the highest end of the nodes. If the result set is empty, (-1, -1) is returned.

delete()[source]#

Delete all selected nodes from their parents.

Internally calls uniq and remove_descendants, so that no unnecessary deletes are done. If a context would become empty, that context itself is deleted instead of all its children (except for the root of course). Returns the number of nodes that were deleted.

Note

If you delete tokens from a tree which belong to a group, the tree cannot reliably be used by a treebuilder for a partial rebuild.

__getitem__(key)[source]#

Get the specified item or items of every context node.

Note that the result nodes always form a flat iterable. No IndexError will be raised if an index would be out of range for any node.

property children#

All direct children of the current nodes.

property all#

All descendants, contexts and their nodes.

property alltokens#

Shortcut for all.tokens.

property allcontexts#

Shortcut for all.contexts.

property parent#

Yield the parent of every node.

This can lead to many double occurrences of the same node in the result set; use uniq to fix that.

property ancestors#

Yield the ancestor contexts of every node.

property first#

Yield the first node of every context node, same as [0].

property last#

Yield the last node of every context node, same as [-1].

property next#

Yield the next token, if any.

property previous#

Yield the previous token, if any.

property forward#

Yield Tokens in forward direction.

property backward#

Yield Tokens in backward direction.

property right#

Yield the right sibling, if any.

property left#

Yield the left sibling, if any.

property right_siblings#

Yield the right siblings, if any.

property left_siblings#

Yield the left siblings, if any.

map(function)[source]#

Call the function on every node and yield its results, which should be zero or more nodes as well.

filter(predicate)[source]#

Yield nodes for which the predicate returns a value that evaluates to True.

property tokens#

Get only the tokens.

property contexts#

Get only the contexts.

property uniq#

Remove double occurrences of the same node from the result set.

This can happen e.g. when you find the parent of multiple nodes.

slice(*args)[source]#

Slice the full result set, using itertools.islice().

This can help narrowing down the result set. For example:

root.query.all("blaat").slice(1).right_siblings.slice(3) ...

will continue the query with only the first occurrence of a token “blaat”, and then look for at most three right siblings. If the slice(1) were not there, all the right siblings would become one large result set because you wouldn’t know how many tokens “blaat” were matched.

property remove_descendants#

Remove nodes that have ancestors in the current node list.

property remove_ancestors#

Remove nodes that have descendants in the current node list.

property is_not#

Invert the next query.

len(min_length, max_length=None)[source]#

Only yield contexts, with min_length, or with length between min and max.

in_range(start=0, end=None)[source]#

Yield a restricted set, tokens and/or contexts must fall in start→end

__call__(*what)[source]#

Yield token if token has that text, or context if context has that lexicon.

You can even mix the types if you’d need to:

for n in tree.query.all("%", Lang.comment):
    # do something

yields tokens that are a percent sign and contexts that have the Lang.comment lexicon.

startingwith(text)[source]#

Yield tokens that start with text.

endingwith(text)[source]#

Yield tokens that end with text.

containing(text)[source]#

Yield tokens that contain the specified text.

matching(pattern, flags=0)[source]#

Yield tokens matching the regular expression.

re.search() is used, so the expression can match anywhere unless you use ^ or $ characters).

action(*actions)[source]#

Yield those tokens whose action is one of the given actions.

in_action(*actions)[source]#

Yield those tokens whose action is or inherits from one of the given actions.