The main parce module

The parce Python module.

The main module provides the listed classes and functions, enough to build a basic language definition or to use the bundled language definitions.

The standard actions that are used by the bundled language definitions to specify the type of parsed text fragments are in the action module. The helper functions for dynamic rule items are in the rule module.

It is recommended to import parce like this:

import parce

although in a language definition it can be easier to do this:

from parce import Language, lexicon, skip, default_action, default_target
from parce.rule import words, bygroup   # whichever you need
import parce.action as a

Then you get the Language class and lexicon decorator from parce, and all standard actions can be accessed via the a prefix, like a.Text.

version

The version as a three-tuple(major, minor, patch). See pkginfo.

version_string

The version as a string.

lexicon(rules_func=None, **kwargs)[source]

Lexicon factory decorator.

Use this decorator to make a function in a Language class definition a LexiconDescriptor object. The LexiconDescriptor is a descriptor, and when calling it via the Language class attribute, a Lexicon is created, cached and returned.

You can specify keyword arguments, that will be passed on to the Lexicon object as soon as it is created.

The following keyword arguments are supported:

re_flags (0):

The flags that are passed to the regular expression compiler

consume (False):

When set to True, tokens originating from a rule that pushed this lexicon are added to the target Context instead of the current.

The code body of the function should return (yield) the rules of the lexicon, and is run with the Language class as first argument, as soon as the lexicon is used for the first time.

You can also call the Lexicon object just as an ordinary classmethod, to get the rules, e.g. for inclusion in a different lexicon.

class Language[source]

Bases: object

A Language represents a set of Lexicons comprising a specific language.

A Language is never instantiated. The class itself serves as a namespace and can be inherited from.

classmethod comment_common()[source]

Provides subtle highlighting within comments.

The default implementation highlights words like TODO, XXX, TEMP, etc. using Comment.Alert, and highlights URLs and email addresses with the Comment.Url and Comment.Email action respectively. Most bundled languages use this method for their comment lexicons.

class Cursor(document, start=0, end=-1)[source]

Bases: object

Describes a certain range (selection) in a Document.

You may change the start and end attributes yourself. Both must be an integer, end may also be None, denoting the end of the document.

As long as you keep a reference to the Cursor, its positions are updated when the document changes. When text is inserted at the start position, the position remains the same. But when text is inserted at the end of a cursor, the end position moves along with the new text. E.g.:

d = Document('hi there, folks!')
c = Cursor(d, 8, 8)
with d:
    d[8:8] = 'new text'
c.start, c.end --> (8, 16)

You can also use a Cursor as key while editing a document:

c = Cursor(d, 8, 8)
with d:
    d[c] = 'new text'

You cannot alter the document via the Cursor.

start
end
document()[source]
text()[source]

Return the selected text, if any.

select(start, end=-1)[source]

Change start and end in one go. End defaults to start.

select_all()[source]

Set start to 0 and end to None; selecting all text.

select_none()[source]

Set end to start.

has_selection()[source]

Return True if text is selected.

lstrip(chars=None)[source]

Move start to the right, if specified characters can be skipped.

By default whitespace is skipped, like Python’s lstrip() string method.

rstrip(chars=None)[source]

Move end to the left, if specified characters can be skipped.

By default whitespace is skipped, like Python’s rstrip() string method.

strip(chars=None)[source]

Adjust start and end, like Python’s strip() method.

class Document(root_lexicon=None, text='', builder=None)[source]

Bases: parce.treedocument.TreeDocumentMixin, parce.document.Document

A Document that automatically keeps its contents tokenized.

You can specify your own TreeBuilder. By default, a BackgroundTreeBuilder is used.

find(name=None, *, filename=None, mimetype=None, contents=None)[source]

Find a root lexicon, either by language name, or by filename, mimetype and/or contents.

If you specify a name, tries to find the language with that name, ignoring the other arguments.

If you don’t specify a name, but instead one or more of the other (keyword) arguments, tries to find the language based on filename, mimetype or contents.

If a language is found, returns the root lexicon. If no language could be found, None is returned (which can also be used as root lexicon, resulting in an empty token tree).

Examples:

>>> import parce
>>> parce.find("xml")
Xml.root
>>> parce.find(contents='{"key": 123;}')
Json.root
>>> parce.find(filename="style.css")
Css.root

This function uses the registry module and by default it finds all bundled languages. See the module’s documentation to find out how to add your own languages to a registry.

root(root_lexicon, text)[source]

Return the root context of the tree structure of all tokens from text.

tokens(root_lexicon, text)[source]

Convenience function that yields all the tokens from the text.

events(root_lexicon, text)[source]

Convenience function that yields all the events from the text.

theme_by_name(name='default')[source]

Return a Theme from the default themes in the themes/ directory.

theme_from_file(filename)[source]

Return a Theme loaded from the specified CSS filename.

default_action = default_action

denotes a default action for unmatched text

default_target = default_target

denotes a default target when no text matches

skip = SkipAction()

A dynamic action that yields no tokens, thereby ignoring the matched text.