The query module

Query the tree using the query property.

Using this module you can query the token tree to find tokens and contexts, based on lexicons and/or actions and text contents. You can chain calls in an XPath-like fashion.

This module supplements the various find_xxx methods of every Context object. A query starts at the query property of a Context or Token object, and initially yields just that object.

You can navigate using children, all, first, last, [n], [n:n], [n:n:n], next, previous, right, left, right_siblings, left_siblings, map(), parent and ancestors. Use uniq to remove double occurrences of nodes, which can e.g. happen when navigating to the parent of all nodes.

You can narrow down the search using tokens, contexts, remove_ancestors, remove_descendants, slice() and filter().

You can search for tokens using (‘text’) or (lexicon), startingwith(), endingwith(), containing(), matching(), action() or in_action(). The special prefix is_not inverts the query, so query.is_not.containing(“bla”) yields Tokens that do not contain the text “bla”.

Examples:

Find all tokens that are the first child of a Context with bla lexicon:

root.query.all(MyLang.bla)[0]

Find (in Xml) all attributes with name ‘name’ that are in a <bla> tag:

root.query.all.action(Name.Tag)("bla").next('name')

Find all tags containing “hi” in their text nodes:

root.query.all.action(Name.Tag).next.next.action(Text).containing('hi')

Find all comments that have TODO in it:

root.query.all.action(Comment).containing('TODO')

Find all “\version” tokens in the root context, that have a “2” in the version string after it:

(t for t in root.query.children('\\version')
    if any(t.query.next.target.children.containing('2')))

Which could also be written as:

root.query.children('\\version').filter(
    lambda t: any(t.query.next.target.children.containing('2')))

A query is a generator, you can iterate over the results. For debugging purposes, there are also the list(), pick(), count() and dump() methods.:

for attrs in q.all.action(Name.Tag)('origin').right:
    for atr in attrs.query.action(Name.Attribute):
        print(atr)

A query resolves to False if there is no single result:

if token.query.ancestors(LilyPond.header):
    do_something() # the token is a descendant of a LilyPond.header context

You can also directly instantiate a Query object for a list of nodes, if you want to query those in one go:

q = Query.from_nodes(nodes)

Summary of the query methods:

Endpoint methods (some are mainly for debugging):

Selecting (filtering) nodes:

These methods filter out current nodes without adding new nodes to the selection:

The special is_not operator inverts the meaning of the next query, e.g.:

n.query.all.is_not.startingwith("text")

The following query methods can be inverted by prepending is_not:

There is a subtle difference between action and in_action: with the first, the action should exactly match, with the latter the tokens are selected when the action exactly matches, or is a descendant of the given action.

class Query(gen, invert=False)[source]

Bases: object

A Query navigates and filters a node tree.

A Query is instantiated either by calling Token.query or Context.query, or by calling Query.from_nodes() on a list of nodes (tokens and/or contexts).

classmethod from_nodes(nodes)[source]

Create a Query object querying a list of nodes in one go.

__bool__()[source]

Return True if there is at least one result.

count()[source]

Compute the length of the iterable.

dump(file=None)[source]

Dump the current selection to the console (or to file).

list()[source]

Return the current selection as a list. Mainly for debugging.

pick(default=None)[source]

Pick the first value, or return the default.

pick_last(default=None)[source]

Pick the last value, or return the default.

range()[source]

Return the text range as a tuple (pos, end).

The pos is the lowest pos of the nodes in the current set, and end is the highest end of the nodes. If the result set is empty, (-1, -1) is returned.

delete()[source]

Delete all selected nodes from their parents.

Internally calls uniq and remove_descendants, so that no unnecessary deletes are done. If a context would become empty, that context itself is deleted instead of all its children (except for the root of course). Returns the number of nodes that were deleted.

__getitem__(key)[source]

Get the specified item or items of every context node.

Note that the result nodes always form a flat iterable. No IndexError will be raised if an index would be out of range for any node.

property children

All direct children of the current nodes.

property all

All descendants, contexts and their nodes.

property alltokens

Shortcut for all.tokens.

property allcontexts

Shortcut for all.contexts.

property parent

Yield the parent of every node.

This can lead to many double occurrences of the same node in the result set; use uniq to fix that.

property ancestors

Yield the ancestor contexts of every node.

property first

Yield the first node of every context node, same as [0].

property last

Yield the last node of every context node, same as [-1].

property next

Yield the next token, if any.

property previous

Yield the previous token, if any.

property forward

Yield Tokens in forward direction.

property backward

Yield Tokens in backward direction.

property right

Yield the right sibling, if any.

property left

Yield the left sibling, if any.

property right_siblings

Yield the right siblings, if any.

property left_siblings

Yield the left siblings, if any.

property target

Yield the target Context for every token, if available.

See Token.target().

property source

Yield the source Token for every context, if available.

See Context.source().

map(function)[source]

Call the function on every node and yield its results, which should be zero or more nodes as well.

filter(predicate)[source]

Yield nodes for which the predicate returns a value that evaluates to True.

property tokens

Get only the tokens.

property contexts

Get only the contexts.

property uniq

Remove double occurrences of the same node from the result set.

This can happen e.g. when you find the parent of multiple nodes.

slice(*args)[source]

Slice the full result set, using itertools.islice().

This can help narrowing down the result set. For example:

root.query.all("blaat").slice(1).right_siblings.slice(3) ...

will continue the query with only the first occurrence of a token “blaat”, and then look for at most three right siblings. If the slice(1) were not there, all the right siblings would become one large result set because you wouldn’t know how many tokens “blaat” were matched.

property remove_descendants

Remove nodes that have ancestors in the current node list.

property remove_ancestors

Remove nodes that have descendants in the current node list.

property is_not

Invert the next query.

len(min_length, max_length=None)[source]

Only yield contexts, with min_length, or with length between min and max.

in_range(start=0, end=None)[source]

Yield a restricted set, tokens and/or contexts must fall in start→end

__call__(*what)[source]

Yield token if token has that text, or context if context has that lexicon.

You can even mix the types if you’d need to:

for n in tree.query.all("%", Lang.comment):
    # do something

yields tokens that are a percent sign and contexts that have the Lang.comment lexicon.

startingwith(text)[source]

Yield tokens that start with text.

endingwith(text)[source]

Yield tokens that end with text.

containing(text)[source]

Yield tokens that contain the specified text.

matching(pattern, flags=0)[source]

Yield tokens matching the regular expression.

re.search() is used, so the expression can match anywhere unless you use ^ or $ characters).

action(*actions)[source]

Yield those tokens whose action is one of the given actions.

in_action(*actions)[source]

Yield those tokens whose action is or inherits from one of the given actions.