The lexicon module

A Lexicon groups rules to match.

A Lexicon is created by decorating a method yielding rules with the @lexicon decorator. (Although this actually creates a LexiconDescriptor. When a LexiconDescriptor is accessed for the first time via a Language subclass, a Lexicon for that class is created and cached, and returned each time that attribute is accessed.)

The Lexicon can parse text according to the rules. When its parse() function is called for the first time, the rules-function is run with the language class as argument, and the rules it yields are cached.

This makes it possible to inherit from a Language class and only re-implement some lexicons, the others keep working as in the base class.

Example:

>>> from parce import Language, lexicon
>>>
>>> class MyLang(Language):
...     @lexicon
...     def numbers(cls):
...         yield r'\d+', "A number"
...         yield r'\w+', "A word"
...
>>> MyLang.numbers
MyLang.numbers
>>> type(MyLang.numbers)
<class 'parce.lexicon.Lexicon'>
>>> for i in MyLang.numbers.parse("1 a2 d3 4 p 5", 0):
...  print(i)
...
(0, '1', <re.Match object; span=(0, 1), match='1'>, 'A number', None)
(2, 'a2', <re.Match object; span=(2, 4), match='a2'>, 'A word', None)
(5, 'd3', <re.Match object; span=(5, 7), match='d3'>, 'A word', None)
(8, '4', <re.Match object; span=(8, 9), match='4'>, 'A number', None)
(10, 'p', <re.Match object; span=(10, 11), match='p'>, 'A word', None)
(12, '5', <re.Match object; span=(12, 13), match='5'>, 'A number', None)

Parsing (better: lexing) is done by a Lexer instance, which switches Lexicon when a target is encountered.

class LexiconDescriptor(rules_func, re_flags=0, consume=False)[source]

Bases: object

The LexiconDescriptor creates a Lexicon when called via a class.

rules_func = None

the function yielding the rules

class Lexicon(descriptor, language, arg=None)[source]

Bases: object

A Lexicon parses text according to rules.

A Lexicon is tied to a particular class, which makes it possible to inherit from a Language class and change only some Lexicons.

parse(text, pos)

Start parsing text from the specified position. Yields five-tuples (pos, text, matchobj, action, target).

The pos is the start position a match was found, text is the matched text, matchobj the match object (which can be None for default actions), action the action that was specified in the matching rule, and target is either None or a Target object.

descriptor = None

The LexiconDescriptor this Lexicon was created by.

language = None

The Language class the lexicon belongs to.

re_flags = None

The re_flags that were set on instantiation.

consume = None

Whether this lexicon wants the token(s) that switched to it

arg = None

The argument the lexicon was called with (creating a derived Lexicon). None for a normal lexicon.

name = None

The short name (name of the method this Lexicon was defined with)

fullname = None

The short name with the Language name prepended, like 'Language.lexicon'.

qualname = None

The full name with the Language’s module prepended, like 'parce.lang.xml.Xml.root'.

equals(other)[source]

Return True if we are the same lexicon or a derivate from the same.

property rules

Return all rules in a tuple.

Rule items that depend on the lexicon argument are already evaluated.

__iter__()[source]

Yield the rules.

Patterns are created when this method is called for the first time. If this is a derived lexicon, dynamic rule items that depend on the argument are already evaluated.

lexicon(rules_func=None, **kwargs)[source]

Lexicon factory decorator.

Use this decorator to make a function in a Language class definition a LexiconDescriptor object. The LexiconDescriptor is a descriptor, and when calling it via the Language class attribute, a Lexicon is created, cached and returned.

You can specify keyword arguments, that will be passed on to the Lexicon object as soon as it is created.

The following keyword arguments are supported:

re_flags (0):

The flags that are passed to the regular expression compiler

consume (False):

When set to True, tokens originating from a rule that pushed this lexicon are added to the target Context instead of the current.

The code body of the function should return (yield) the rules of the lexicon, and is run with the Language class as first argument, as soon as the lexicon is used for the first time.

You can also call the Lexicon object just as an ordinary classmethod, to get the rules, e.g. for inclusion in a different lexicon.