The lexer module¶
The Lexer is responsible for parsing text using Lexicons.
The lexer generates Event tuples, which contain a target (or None) and one or
more lexemes. The target, if not None, specifies a state change (i.e. leave
the current lexicon(s) and/or descend into specified lexicons. (See the
target
module.)
The lexemes is a tuple of one or more lexeme tuples. A lexeme is a (pos,
text, action)
tuple. Note that an Event always contains at least one lexeme
tuple, and that a lexeme’s text is always non-empty. (A rule’s pattern might
match the empty string, but no event is generated in that case, although the
target is followed.)
The Lexer is capable of handling circular default targets: if a target is
pushed again in the same context at the same text position (and another
target pops back), it is detected and the text position pointer is advanced
by one. (Run-away pushed targets are not detected, those are detected by
the validate
module.)
The TreeBuilder (treebuilder
) uses a Lexer internally to parse
text and create the tree structure.
Example:
>>> import parce.lexer
>>> import parce.lang.css
>>> for e in parce.lexer.Lexer([parce.lang.css.Css.root]).events("h1 { color: red; }"):
... print(e)
...
Event(target=Target(pop=0, push=[Css.prelude, Css.selector, Css.element_selector]), lexemes=((0, 'h1', Name.Tag),))
Event(target=Target(pop=-2, push=[]), lexemes=((3, '{', Delimiter),))
Event(target=Target(pop=-1, push=[Css.rule, Css.declaration, Css.property]), lexemes=((5, 'color', Name.Property),))
Event(target=Target(pop=-1, push=[]), lexemes=((10, ':', Delimiter),))
Event(target=Target(pop=0, push=[Css.identifier]), lexemes=((12, 'red', Literal.Color),))
Event(target=Target(pop=-1, push=[]), lexemes=((15, ';', Delimiter),))
Event(target=Target(pop=-1, push=[]), lexemes=((17, '}', Delimiter),))
There is a convenience function in the parce module namespace that calls Lexer for you:
>>> import parce
>>> from parce.lang.css import Css
>>> for e in parce.events(Css.root, "h1 { color: red; }"):
... print(e)
And here’s how the same text would translate to a tree structure:
>>> parce.root(parce.lang.css.Css.root, "h1 { color: red; }").dump()
<Context Css.root at 0-18 (2 children)>
├╴<Context Css.prelude at 0-4 (2 children)>
│ ├╴<Context Css.selector at 0-2 (1 children)>
│ │ ╰╴<Context Css.element_selector at 0-2 (1 children)>
│ │ ╰╴<Token 'h1' at 0:2 (Name.Tag)>
│ ╰╴<Token '{' at 3:4 (Delimiter)>
╰╴<Context Css.rule at 5-18 (2 children)>
├╴<Context Css.declaration at 5-16 (4 children)>
│ ├╴<Context Css.property at 5-10 (1 children)>
│ │ ╰╴<Token 'color' at 5:10 (Name.Property)>
│ ├╴<Token ':' at 10:11 (Delimiter)>
│ ├╴<Context Css.identifier at 12-15 (1 children)>
│ │ ╰╴<Token 'red' at 12:15 (Literal.Color)>
│ ╰╴<Token ';' at 15:16 (Delimiter)>
╰╴<Token '}' at 17:18 (Delimiter)>
-
class
Lexer
(lexicons)[source]¶ Bases:
object
A Lexer is responsible for parsing text using Lexicons.
lexicons
is a list of one or more lexicon instances, the first one being the root lexicon. Lexicons can add lexicons to this list and pop lexicons off while parsing text. The first lexicon is never popped off.While parsing text using the
events()
method, thelexicons
attribute reflects the current state: the current lexicon is at the end.