The lexer module#

The Lexer is responsible for parsing text using Lexicons.

The lexer generates Event tuples, which contain a target (or None) and one or more lexemes. The target, if not None, specifies a state change (i.e. leave the current lexicon(s) and/or descend into specified lexicons. (See the target module.)

The lexemes is a tuple of one or more lexeme tuples. A lexeme is a (pos, text, action) tuple. Note that an Event always contains at least one lexeme tuple, and that a lexeme’s text is always non-empty. (A rule’s pattern might match the empty string, but no event is generated in that case, although the target is followed.)

The Lexer is capable of handling circular default targets: if a target is pushed again in the same context at the same text position (and another target pops back), it is detected and the text position pointer is advanced by one. (Run-away pushed targets are not detected, those are detected by the validate module.)

The TreeBuilder (treebuilder) uses a Lexer internally to parse text and create the tree structure.

Example:

>>> import parce.lexer
>>> import parce.lang.css
>>> for e in parce.lexer.Lexer([parce.lang.css.Css.root]).events("h1 { color: red; }"):
...     print(e)
...
Event(target=Target(pop=0, push=(Css.prelude, Css.selector, Css.element_selector)), lexemes=((0, 'h1', Name.Tag),))
Event(target=Target(pop=-2, push=()), lexemes=((3, '{', Delimiter.Bracket),))
Event(target=Target(pop=-1, push=(Css.rule, Css.declaration, Css.property)), lexemes=((5, 'color', Name.Property.Definition),))
Event(target=Target(pop=-1, push=()), lexemes=((10, ':', Delimiter),))
Event(target=Target(pop=0, push=(Css.identifier,)), lexemes=((12, 'red', Literal.Color),))
Event(target=Target(pop=-1, push=()), lexemes=((15, ';', Delimiter),))
Event(target=Target(pop=-1, push=()), lexemes=((17, '}', Delimiter.Bracket),))

There is a convenience function in the parce module namespace that calls Lexer for you:

>>> import parce
>>> from parce.lang.css import Css
>>> for e in parce.events(Css.root, "h1 { color: red; }"):
...     print(e)

And here’s how the same text would translate to a tree structure:

>>> parce.root(parce.lang.css.Css.root, "h1 { color: red; }").dump()
<Context Css.root at 0-18 (2 children)>
 ├╴<Context Css.prelude at 0-4 (2 children)>
 │  ├╴<Context Css.selector at 0-2 (1 children)>
 │  │  ╰╴<Context Css.element_selector at 0-2 (1 children)>
 │  │     ╰╴<Token 'h1' at 0:2 (Name.Tag)>
 │  ╰╴<Token '{' at 3:4 (Delimiter)>
 ╰╴<Context Css.rule at 5-18 (2 children)>
    ├╴<Context Css.declaration at 5-16 (4 children)>
    │  ├╴<Context Css.property at 5-10 (1 children)>
    │  │  ╰╴<Token 'color' at 5:10 (Name.Property)>
    │  ├╴<Token ':' at 10:11 (Delimiter)>
    │  ├╴<Context Css.identifier at 12-15 (1 children)>
    │  │  ╰╴<Token 'red' at 12:15 (Literal.Color)>
    │  ╰╴<Token ';' at 15:16 (Delimiter)>
    ╰╴<Token '}' at 17:18 (Delimiter)>

class Event(target, lexemes)#

Bases: tuple

lexemes#: One or more (pos, text, action) tuples.

target#: A Target or None.

class Lexer(lexicons)[source]#

Bases: object

A Lexer is responsible for parsing text using Lexicons.

lexicons is a list of one or more lexicon instances, the first one being the root lexicon. Lexicons can add lexicons to this list and pop lexicons off while parsing text. The first lexicon is never popped off.

While parsing text using the events() method, the lexicons attribute reflects the current state: the current lexicon is at the end.

events(text, pos=0)[source]#: Get the events from parsing text from the specified position.

filter_actions(action, pos, text, match)[source]#: Handle filtering via DynamicAction instances.