The lexicon module#

A Lexicon groups rules to match.

A Lexicon is created by decorating a method yielding rules with the @lexicon decorator. (Although this actually creates a LexiconDescriptor. When a LexiconDescriptor is accessed for the first time via a Language subclass, a Lexicon for that class is created and cached, and returned each time that attribute is accessed.)

This makes it possible to inherit from a Language class and only re-implement some lexicons, the others keep working as in the base class.

The Lexicon can parse text according to the rules. When its parse() function is called for the first time, the rules-function is run with the language class as argument, and the rules it yields are cached.

The Lexicon then combines the patterns of the rules into one regular expression that is used to parse the text, using some smart optimizations. (For example, when a lexicon has only one pattern rule which turns out to be an unambigious string, str.find() is used rather than using re.search().)

Example:

>>> from parce import Language, lexicon
>>>
>>> class MyLang(Language):
...     @lexicon
...     def numbers(cls):
...         yield r'\d+', "A number"
...         yield r'\w+', "A word"
...
>>> MyLang.numbers
MyLang.numbers
>>> type(MyLang.numbers)
<class 'parce.lexicon.Lexicon'>
>>> for i in MyLang.numbers.parse("1 a2 d3 4 p 5", 0):
...  print(i)
...
(0, '1', <re.Match object; span=(0, 1), match='1'>, 'A number', None)
(2, 'a2', <re.Match object; span=(2, 4), match='a2'>, 'A word', None)
(5, 'd3', <re.Match object; span=(5, 7), match='d3'>, 'A word', None)
(8, '4', <re.Match object; span=(8, 9), match='4'>, 'A number', None)
(10, 'p', <re.Match object; span=(10, 11), match='p'>, 'A word', None)
(12, '5', <re.Match object; span=(12, 13), match='5'>, 'A number', None)

Parsing (better: lexing) is done by a Lexer instance, which switches Lexicon when a target is encountered.

class Lexicon(descriptor, language, arg=None)[source]#

Bases: object

A Lexicon parses text according to rules.

A Lexicon is tied to a particular class, which makes it possible to inherit from a Language class and change only some Lexicons.

parse(text, pos)#

Start parsing text from the specified position. Yields five-tuples (pos, text, matchobj, action, target).

The pos is the start position a match was found, text is the matched text, matchobj the match object (which can be None for default actions), action the action that was specified in the matching rule, and target is either None or a Target object.

descriptor#

The LexiconDescriptor this Lexicon was created by.

language#

The Language class the lexicon belongs to.

re_flags#

The re_flags that were set on instantiation.

consume#

Whether this lexicon wants the token(s) that switched to it

name#

The short name (name of the method this Lexicon was defined with)

fullname#

The short name with the Language name prepended, like 'Language.lexicon'.

qualname#

The full name with the Languageā€™s module prepended, like 'parce.lang.xml.Xml.root'.

property arg#

The argument the lexicon was called with (creating a derived Lexicon). None for a normal lexicon.

__call__(arg=None)[source]#

Create a derived Lexicon with argument arg.

The argument should be a simple, hashable singleton object, such as a string, an integer or a standard action. The created Lexicon is cached. The argument is accessible using special pattern and rule item types, so a derived Lexicon can parse text based on rules that are defined at parse time, which is useful for things like here documents, where you only get to know the end token after the start token has been found.

When comparing Lexicons with ==, a derived lexicon compares equal with the Lexicon that created it, although they co-exist as separate objects. Use is to compare on identity.

When yielding the rules from a derived lexicon, the dynamic rule items that depend on the Lexicon argument are already evaluated. When yielding the rules from a vanilla lexicon, they are not evaluated, so they adjust themselves to the lexicon they are included in (which will then evaluate the rules of course).

If arg is None, self is returned.

property rules#

Return all rules in a tuple.

Rule items that depend on the lexicon argument are already evaluated.

__iter__()[source]#

Yield the rules.

Patterns are created when this method is called for the first time. If this is a derived lexicon, dynamic rule items that depend on the argument are already evaluated.

class LexiconDescriptor(rules_func, re_flags=0, consume=False)[source]#

Bases: object

The LexiconDescriptor creates a Lexicon when called via a class.

rules_func#

the function yielding the rules