The work module

This module defines the Worker class.

A Worker is designed to run a TreeBuilder and a Transformer as soon as source text is updated. It is possible to run those jobs in a background thread.

The whole process is divided in certain stages, and performed by exhausting the Worker.process() generator fully.

The Worker is intended to be used as the compagnon for the Document class and cause the TreeBuilder and (if set) the Transformer to do their jobs in a configurable and flexible manner.

It is possible to wait for the parce tree of the transform result, or to arrange for a callback to be called when the work is done. As Worker inherits Observable, you can connect to its events to get notified when a tree or transform is updated.

Inherit of Worker to implement other features or another way to use a background thread for (parts of) the job.

class Worker(treebuilder, transformer=None)[source]

Bases: parce.util.Observable

Runs the TreeBuilder and the Transformer.

Initialize with a TreeBuilder and optionally a Transformer. It is not possible to change the treebuilder later; but you can set another transformer, or use no transformer at all.

Call update() to re-run the treebuilder on changed text, or new text, or to use a new root lexicon. Call set_transformer() to set another Transformer, which triggers a re-run of the transformer alone.

You can connect() to the following signals:

"started":

emitted when a build process has started

"tree_updated":

emitted when a tree (re)build has finished; the handler is called with two arguments: start, end, that denote the updated text range

"tree_finished":

emitted when a (re)build has finished; the handler is called without arguments

"transform_finished":

emitted when a transform rebuild has finished; the handler is called without arguments.

builder()[source]

Return the TreeBuilder we were initialized with.

set_transformer(transformer)[source]

Set the Transformer to use.

You may use one Transformer for multiple Workers. Use None to remove the current transformer.

Setting a new Transformer updates the transform result. This method should always be called from the main thread.

transformer()[source]

Return the current Transformer, if set.

update(text, root_lexicon=False, start=0, removed=0, added=None)[source]

Start a process to update the tree and the transform.

For the meaning of the arguments, see treebuilder.TreeBuilder.rebuild().

This method should always be called from the main thread.

start()[source]

Start the update process.

Sets the initial state and then calls run_process(). This method should always be called from the main thread.

run_process()[source]

Exhaust the process() generator.

Called by start(); performs the work after initial state has been set up.

This method should always be called from the main thread, but may be reimplemented to do (parts of the) work in a background thread.

process()[source]

Generator performing the actual process, exhausted by run_process().

wait_build()[source]

Wait for the build job to be completed.

Immediately returns if there is no build job active.

wait_transform()[source]

Wait for the transform job to be completed.

Immediately returns if there is no transform job active.

get_root(wait=False, callback=None)[source]

Return the root element of the completed tree.

This is simply the builder’s root instance attribute, but this method only returns the tree when it is up-to-date.

If wait is True, this call blocks until tokenizing is done, and the full tree is returned. If wait is False, None is returned if the tree is still busy being built.

If a callback is given and tokenizing is still busy, that callback is called once when tokenizing is ready, with the Worker as the sole argument.

Note that, for the lifetime of a Worker and a TreeBuilder, the root element is always the same. The root element is also accessible in the builder’s root attribute. But using this method you can be sure that you are dealing with a complete and fully intact tree.

get_transform(wait=False, callback=None)[source]

Return the transformed result.

If wait is True, the call blocks until (tokenizing and) transforming is finished. If wait is False, None is returned if the transform is not yet ready.

If a callback is given and transformation is not finished yet, that callback is called once when transforming is ready, with this Worker as the sole argument.

If no Transformer was set, None is returned always.

slot_invalidate(context)[source]

Called when TreeBuilder emits ("invalidate", context).

Clears the node and its parents from the transform cache.

slot_replace()[source]

Called when TreeBuilder emits "replace".

Interrupts the transformer.

start_build()[source]

Called when the build process starts.

Emits the 'started' event.

finish_build()[source]

Called when the treebuilder is done.

Emits 'tree_updated', start, end and then 'tree_finished', when the tree has been updated.

finish_transform()[source]

Called when the transform is finished.

Emits 'transform_finished' when the transform has been updated.

class BackgroundWorker(treebuilder, transformer=None)[source]

Bases: parce.work.Worker

A Worker implementation that does the work in a background thread.

run_process()[source]

Run the update process in a background thread.

class WorkerDocumentMixin(root_lexicon=None, text='', worker=None, transformer=None)[source]

Bases: object

Adds a Worker to a Document to automatically update the tokenized tree and the transformed result.

Combine this class with a subclass of AbstractDocument (see the document module).

Everytime the text is modified, only the modified part is retokenized. If that changes the lexicon in which the last part (after the modified part) starts, that part is also retokenized, until the state (the list of active lexicons) matches the state of existing tokens.

Also the transformed result, if a transformer is set, is updated.

worker()[source]

Return the Worker we were instantiated with.

builder()[source]

Return the worker’s TreeBuilder.

transformer()[source]

Return the worker’s Transformer, if set.

set_transformer(transformer)[source]

Set a new Transformer in the worker.

Specify None to remove the current transformer.

root_lexicon()[source]

Return the currently set root lexicon.

set_root_lexicon(root_lexicon)[source]

Set the root lexicon to use to tokenize the text.

Triggers an update of the tokenized tree.

get_root(wait=False, callback=None)[source]

Get the root element of the completed tree.

If wait is True, this call blocks until tokenizing is done, and the full tree is returned. If wait is False, None is returned if the tree is still busy being built.

If a callback is given and tokenizing is still busy, that callback is called once when tokenizing is ready, with this Document as the sole argument.

open_lexicons()[source]

Return the list of lexicons that were left open at the end of the text.

The root lexicon is not included; if parsing ended in the root lexicon, this list is empty, and the text can be considered “complete.”

modified_range()[source]

Return a two-tuple(start, end) describing the range that was re-tokenized.

get_transform(wait=False, callback=None)[source]

Return the transformed result (if a Transformer is active in the Worker).

If wait is True, the call blocks until (tokenizing and) transforming is finished. If wait is False, None is returned if the transform is not yet ready.

If a callback is given and transformation is not finished yet, that callback is called once when transforming is ready, with this Document as the sole argument.

text_changed(start, removed, added)[source]

Called after modification of the text.

Retokenizes the modified part and updates the transformation.

token(pos)[source]

Returns the token at the specified position, in an intuitive way.

If a token starts at position, it is returned. Otherwise, if a token ends at position, it is returned. Will not return a token that is in a different block. Returns None if there are no tokens in the block.