The main parce module¶
The parce Python module.
The main module provides the listed classes and functions, enough to build a basic language definition or to use the bundled language definitions.
The standard actions that are used by the bundled language definitions to
specify the type of parsed text fragments are in the action
module. The helper functions for dynamic rule items are in the
rule
module.
It is recommended to import parce like this:
import parce
although in a language definition it can be easier to do this:
from parce import Language, lexicon, skip, default_action, default_target
from parce.rule import words, bygroup # whichever you need
import parce.action as a
Then you get the Language
class and lexicon
decorator from parce, and
all standard actions can be accessed via the a
prefix, like a.Text
.
-
version_string
¶ The version as a string.
-
class
Document
(root_lexicon=None, text='', url=None, encoding=None, worker=None, transformer=None)[source]¶ Bases:
parce.DocumentInterface
,parce.document.Document
A Document that automatically keeps its contents tokenized.
A Document holds an editable text string and keeps the tokenized tree and (if a Transformer is used) the transformed result up to date on every text change. Arguments:
root_lexicon
:The root lexicon to use (default None)
text
:The initial text (default the empty string)
url
:The url or file name to be stored in the
url
attribute.encoding
:The encoding to be stored in the
encoding
attribute.worker
:Use the specified
Worker
. By default, aBackgroundWorker
is usedtransformer
:Use the specified
Transformer
. By default, no Transformer is installed. As a convenience, you can specifyTrue
, in which case a default Transformer is installed
In addition to the events mentioned in the
document.Document
base class, the following events are emitted:"tree_updated" (start, end)
:emitted when the tokenized tree has been updated; the handler is called with two arguments:
start
,end
, that denote the updated text range"tree_finished"
:emitted when the tokenized tree has been updated; the handler is called without arguments
"transform_finished"
:emitted when a transform rebuild has finished; the handler is called without arguments
Using the
connect()
method you can connect to these events.With the
get_root()
method you get the parsed tree. An example:>>> d = parce.Document(parce.find('xml'), '<xml>Hi!</xml>') >>> d.get_root(True).dump() <Context Xml.root at 0-14 (4 children)> ├╴<Token '<' at 0:1 (Delimiter)> ├╴<Token 'xml' at 1:4 (Name.Tag)> ├╴<Token '>' at 4:5 (Delimiter)> ╰╴<Context Xml.tag at 5-14 (4 children)> ├╴<Token 'Hi!' at 5:8 (Text)> ├╴<Token '</' at 8:10 (Delimiter)> ├╴<Token 'xml' at 10:13 (Name.Tag)> ╰╴<Token '>' at 13:14 (Delimiter)> >>> d[5:8] = "hello there!" # replace the text "Hi!" >>> d.get_root(True).dump() <Context Xml.root at 0-23 (4 children)> ├╴<Token '<' at 0:1 (Delimiter)> ├╴<Token 'xml' at 1:4 (Name.Tag)> ├╴<Token '>' at 4:5 (Delimiter)> ╰╴<Context Xml.tag at 5-23 (4 children)> ├╴<Token 'hello there!' at 5:17 (Text)> ├╴<Token '</' at 17:19 (Delimiter)> ├╴<Token 'xml' at 19:22 (Name.Tag)> ╰╴<Token '>' at 22:23 (Delimiter)>
If you use a Transformer, the transformed result is also kept up to date. The
get_transform()
method gives you the transformed result. For example:>>> import parce >>> d = parce.Document(parce.find('json'), '{"key": [1, 2, 3, 4, 5, 6, 7, 8, 9]}', transformer=True) >>> d.get_transform(True) {'key': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
-
class
Cursor
(document, pos=0, end=- 1)[source]¶ Bases:
parce.document.AbstractTextRange
Describes a certain range (selection) in a
Document
.You may change the
pos
andend
attributes yourself. Both must be an integer, end may also be None, denoting the end of the document.As long as you keep a reference to the Cursor, its positions are updated when the document changes. When text is inserted at
pos
, the position remains the same. But when text is inserted at the end of a cursor, theend
position (if not None) moves along with the new text. E.g.:>>> from parce.document import Document, Cursor >>> d = Document('hi there, folks!') >>> c = Cursor(d, 8, 8) >>> with d: ... d[8:8] = 'new text' ... >>> c.pos, c.end (8, 16)
You can also use a Cursor as key while editing a document:
>>> c = Cursor(d, 8, 8) >>> with d: ... d[c] = 'new text'
You cannot alter the document via the Cursor. All move and select methods return the cursor again, so they can be chained:
>>> c = Cursor(d).select_all() >>> c.pos, c.end (0, None)
-
selection
()[source]¶ Return the two-tuple (pos, end) denoting the selected range.
The
end
value is never None, it is set to the length of the document if theend
attribute is None.
-
select_start_of_block
()[source]¶ Moves the selection pos to the beginning of the current line.
Returns self.
-
select_end_of_block
()[source]¶ Moves the selection end (if not None) to the end of its line.
Returns self.
-
lstrip
(chars=None)[source]¶ Move pos to the right, if specified characters can be skipped.
By default whitespace is skipped, like Python’s lstrip() string method. Returns self.
-
-
find
(name=None, *, filename=None, mimetype=None, contents=None)[source]¶ Find a root lexicon, either by language name, or by filename, mimetype and/or contents.
If you specify a name, tries to find the language with that name, ignoring the other arguments.
If you don’t specify a name, but instead one or more of the other (keyword) arguments, tries to find the language based on filename, mimetype or contents.
If a language is found, returns the root lexicon. If no language could be found, None is returned (which can also be used as root lexicon, resulting in an empty token tree).
Examples:
>>> import parce >>> parce.find("xml") Xml.root >>> parce.find(contents='{"key": 123;}') Json.root >>> parce.find(filename="style.css") Css.root
This function uses the
registry
module and by default it finds all bundled languages. See the module’s documentation to find out how to add your own languages to a registry.
-
root
(root_lexicon, text)[source]¶ Return the root context of the tree structure of all tokens from text.
-
theme_by_name
(name='default')[source]¶ Return a Theme from the default themes in the themes/ directory.
-
class
Language
[source]¶ Bases:
object
A Language represents a set of Lexicons comprising a specific language.
A Language is never instantiated. The class itself serves as a namespace and can be inherited from.
-
classmethod
comment_common
()[source]¶ Provides subtle highlighting within comments.
The default implementation highlights words like TODO, XXX, TEMP, etc. using Comment.Alert, and highlights URLs and email addresses with the Comment.Url and Comment.Email action respectively. Most bundled languages use this method for their comment lexicons.
-
classmethod
-
default_action
= default_action¶ denotes a default action for unmatched text
-
default_target
= default_target¶ denotes a default target when no text matches
-
lexicon
(rules_func=None, **kwargs)[source]¶ Lexicon factory decorator.
Use this decorator to make a function in a Language class definition a LexiconDescriptor object. The LexiconDescriptor is a descriptor, and when calling it via the Language class attribute, a Lexicon is created, cached and returned.
You can specify keyword arguments, that will be passed on to the Lexicon object as soon as it is created.
The following keyword arguments are supported:
re_flags
(0):The flags that are passed to the regular expression compiler
consume
(False):When set to True, tokens originating from a rule that pushed this lexicon are added to the target Context instead of the current.
The code body of the function should return (yield) the rules of the lexicon, and is run with the Language class as first argument, as soon as the lexicon is used for the first time.
You can also call the Lexicon object just as an ordinary classmethod, to get the rules, e.g. for inclusion in a different lexicon.
-
skip
= SkipAction()¶ A dynamic action that yields no tokens, thereby ignoring the matched text.
-
class
DocumentInterface
(root_lexicon=None, text='', url=None, encoding=None, worker=None, transformer=None)[source]¶ Bases:
parce.docio.DocumentIOMixin
,parce.work.WorkerDocumentMixin
,parce.document.AbstractDocument
This abstract class defines the full interface of a parce Document.
Inherit this to implement a parce document type that proxies e.g. a text document in a GUI editor. Also use this class to check if an object is a parce Document.