The main parce module

The parce Python module.

The main module provides the listed classes and functions, enough to build a basic language definition or to use the bundled language definitions.

The standard actions that are used by the bundled language definitions to specify the type of parsed text fragments are in the action module. The helper functions for dynamic rule items are in the rule module.

It is recommended to import parce like this:

import parce

although in a language definition it can be easier to do this:

from parce import Language, lexicon, skip, default_action, default_target
from parce.rule import words, bygroup   # whichever you need
import parce.action as a

Then you get the Language class and lexicon decorator from parce, and all standard actions can be accessed via the a prefix, like a.Text.

version

The version as a three-tuple(major, minor, patch). See pkginfo.

version_string

The version as a string.

class Document(root_lexicon=None, text='', url=None, encoding=None, worker=None, transformer=None)[source]

Bases: parce.DocumentInterface, parce.document.Document

A Document that automatically keeps its contents tokenized.

A Document holds an editable text string and keeps the tokenized tree and (if a Transformer is used) the transformed result up to date on every text change. Arguments:

root_lexicon:

The root lexicon to use (default None)

text:

The initial text (default the empty string)

url:

The url or file name to be stored in the url attribute.

encoding:

The encoding to be stored in the encoding attribute.

worker:

Use the specified Worker. By default, a BackgroundWorker is used

transformer:

Use the specified Transformer. By default, no Transformer is installed. As a convenience, you can specify True, in which case a default Transformer is installed

In addition to the events mentioned in the document.Document base class, the following events are emitted:

"tree_updated" (start, end):

emitted when the tokenized tree has been updated; the handler is called with two arguments: start, end, that denote the updated text range

"tree_finished":

emitted when the tokenized tree has been updated; the handler is called without arguments

"transform_finished":

emitted when a transform rebuild has finished; the handler is called without arguments

Using the connect() method you can connect to these events.

With the get_root() method you get the parsed tree. An example:

>>> d = parce.Document(parce.find('xml'), '<xml>Hi!</xml>')
>>> d.get_root(True).dump()
<Context Xml.root at 0-14 (4 children)>
 ├╴<Token '<' at 0:1 (Delimiter)>
 ├╴<Token 'xml' at 1:4 (Name.Tag)>
 ├╴<Token '>' at 4:5 (Delimiter)>
 ╰╴<Context Xml.tag at 5-14 (4 children)>
    ├╴<Token 'Hi!' at 5:8 (Text)>
    ├╴<Token '</' at 8:10 (Delimiter)>
    ├╴<Token 'xml' at 10:13 (Name.Tag)>
    ╰╴<Token '>' at 13:14 (Delimiter)>
>>> d[5:8] = "hello there!"             # replace the text "Hi!"
>>> d.get_root(True).dump()
<Context Xml.root at 0-23 (4 children)>
 ├╴<Token '<' at 0:1 (Delimiter)>
 ├╴<Token 'xml' at 1:4 (Name.Tag)>
 ├╴<Token '>' at 4:5 (Delimiter)>
 ╰╴<Context Xml.tag at 5-23 (4 children)>
    ├╴<Token 'hello there!' at 5:17 (Text)>
    ├╴<Token '</' at 17:19 (Delimiter)>
    ├╴<Token 'xml' at 19:22 (Name.Tag)>
    ╰╴<Token '>' at 22:23 (Delimiter)>

If you use a Transformer, the transformed result is also kept up to date. The get_transform() method gives you the transformed result. For example:

>>> import parce
>>> d = parce.Document(parce.find('json'), '{"key": [1, 2, 3, 4, 5, 6, 7, 8, 9]}', transformer=True)
>>> d.get_transform(True)
{'key': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
class Cursor(document, pos=0, end=- 1)[source]

Bases: parce.document.AbstractTextRange

Describes a certain range (selection) in a Document.

You may change the pos and end attributes yourself. Both must be an integer, end may also be None, denoting the end of the document.

As long as you keep a reference to the Cursor, its positions are updated when the document changes. When text is inserted at pos, the position remains the same. But when text is inserted at the end of a cursor, the end position (if not None) moves along with the new text. E.g.:

>>> from parce.document import Document, Cursor
>>> d = Document('hi there, folks!')
>>> c = Cursor(d, 8, 8)
>>> with d:
...     d[8:8] = 'new text'
...
>>> c.pos, c.end
(8, 16)

You can also use a Cursor as key while editing a document:

>>> c = Cursor(d, 8, 8)
>>> with d:
...     d[c] = 'new text'

You cannot alter the document via the Cursor. All move and select methods return the cursor again, so they can be chained:

>>> c = Cursor(d).select_all()
>>> c.pos, c.end
(0, None)
block()[source]

Return the Block our pos is in.

blocks()[source]

Yield the Blocks from pos to end.

move_start_of_block()[source]

Move pos and end to the start of the current block. Returns self.

move_end_of_block()[source]

Move pos and end to the end of the current block. Returns self.

select(pos, end=- 1)[source]

Change pos and end in one go. End defaults to pos. Returns self.

select_all()[source]

Set pos to 0 and end to None; selecting all text. Returns self.

select_none()[source]

Set end to pos. Returns self.

selection()[source]

Return the two-tuple (pos, end) denoting the selected range.

The end value is never None, it is set to the length of the document if the end attribute is None.

has_selection()[source]

Return True if text is selected.

select_start_of_block()[source]

Moves the selection pos to the beginning of the current line.

Returns self.

select_end_of_block()[source]

Moves the selection end (if not None) to the end of its line.

Returns self.

lstrip(chars=None)[source]

Move pos to the right, if specified characters can be skipped.

By default whitespace is skipped, like Python’s lstrip() string method. Returns self.

rstrip(chars=None)[source]

Move end to the left, if specified characters can be skipped.

By default whitespace is skipped, like Python’s rstrip() string method. Returns self.

strip(chars=None)[source]

Adjust pos and end, like Python’s strip() method. Returns self.

events(root_lexicon, text)[source]

Convenience function that yields all the events from the text.

find(name=None, *, filename=None, mimetype=None, contents=None)[source]

Find a root lexicon, either by language name, or by filename, mimetype and/or contents.

If you specify a name, tries to find the language with that name, ignoring the other arguments.

If you don’t specify a name, but instead one or more of the other (keyword) arguments, tries to find the language based on filename, mimetype or contents.

If a language is found, returns the root lexicon. If no language could be found, None is returned (which can also be used as root lexicon, resulting in an empty token tree).

Examples:

>>> import parce
>>> parce.find("xml")
Xml.root
>>> parce.find(contents='{"key": 123;}')
Json.root
>>> parce.find(filename="style.css")
Css.root

This function uses the registry module and by default it finds all bundled languages. See the module’s documentation to find out how to add your own languages to a registry.

root(root_lexicon, text)[source]

Return the root context of the tree structure of all tokens from text.

theme_by_name(name='default')[source]

Return a Theme from the default themes in the themes/ directory.

theme_from_file(filename)[source]

Return a Theme loaded from the specified CSS filename.

class Language[source]

Bases: object

A Language represents a set of Lexicons comprising a specific language.

A Language is never instantiated. The class itself serves as a namespace and can be inherited from.

classmethod comment_common()[source]

Provides subtle highlighting within comments.

The default implementation highlights words like TODO, XXX, TEMP, etc. using Comment.Alert, and highlights URLs and email addresses with the Comment.Url and Comment.Email action respectively. Most bundled languages use this method for their comment lexicons.

default_action = default_action

denotes a default action for unmatched text

default_target = default_target

denotes a default target when no text matches

lexicon(rules_func=None, **kwargs)[source]

Lexicon factory decorator.

Use this decorator to make a function in a Language class definition a LexiconDescriptor object. The LexiconDescriptor is a descriptor, and when calling it via the Language class attribute, a Lexicon is created, cached and returned.

You can specify keyword arguments, that will be passed on to the Lexicon object as soon as it is created.

The following keyword arguments are supported:

re_flags (0):

The flags that are passed to the regular expression compiler

consume (False):

When set to True, tokens originating from a rule that pushed this lexicon are added to the target Context instead of the current.

The code body of the function should return (yield) the rules of the lexicon, and is run with the Language class as first argument, as soon as the lexicon is used for the first time.

You can also call the Lexicon object just as an ordinary classmethod, to get the rules, e.g. for inclusion in a different lexicon.

skip = SkipAction()

A dynamic action that yields no tokens, thereby ignoring the matched text.

class DocumentInterface(root_lexicon=None, text='', url=None, encoding=None, worker=None, transformer=None)[source]

Bases: parce.docio.DocumentIOMixin, parce.work.WorkerDocumentMixin, parce.document.AbstractDocument

This abstract class defines the full interface of a parce Document.

Inherit this to implement a parce document type that proxies e.g. a text document in a GUI editor. Also use this class to check if an object is a parce Document.