The document module#

Document and Cursor form the basis of handling of documents in the parce package.

A Document contains a text string that is mutable via item and slice methods.

If you make modifications while inside a context (using the Python context manager protocol), the modifications are only applied when the context exits for the last time.

For tokenized documents (see parce.Document), parce inherits from this base class (see the work module).

You can use a Cursor to keep track of positions in a document. The position (and selection) of a Cursor is adjusted when the text in the document is changed.

You can use the various find_block() and blocks() methods to iterate over a Document on a line-by-line basis.

class AbstractDocument(text='', url=None, encoding=None)[source]#

Bases: AbstractMutableString

Base class for a Document.

A Document is like a mutable string, but understands Cursor and Block.

modified = False#

Whether this document is modified

block_separator = '\n'#

separator to use for block boundaries (newline)

url = None#

can be set to the url this document is loaded from

encoding = None#

can be set to the encoding used to read/write this document

revision()[source]#

Return the revision number.

This number is incremented by one on every document change.

find_start_of_block(position)[source]#

Find the start of the block the position is in.

find_end_of_block(position)[source]#

Find the end of the block the position is in.

find_block(position)[source]#

Return a Block representing the text line (block) at position.

A position larger than the document’s length just returns the last block. (A document has always at least one block).

find_block_by_number(number)[source]#

Return the Block for text line number.

The first block has number 0. Returns None when the document has less blocks than the specified number. Negative numbers count backwards from the end.

Avoid this method and block_count() where you can, they are potentially expensive for large documents. Prefer find_block() and Block.next_block() or Block.previous_block() for iteration.

block_count()[source]#

Return the number of blocks (lines) in this document.

This counts the number of occurrences of block_separator in the full text, incremented with 1. A document has always at least one block.

Avoid this method and find_block_by_number() where you can, they are potentially expensive for large documents. Prefer find_block() and Block.next_block() or Block.previous_block() for iteration.

blocks(start=0, end=None)[source]#

Yield Blocks, starting at position start, ending at end.

Start defaults to 0, end to None, which means iterate to the last block.

replace(old, new, start=0, end=None, count=0)[source]#

Replace occurrences of old with new in region start->end.

If count > 0, specifies the maximum number of occurrences to be replaced.

re_sub(pattern, replacement, start=0, end=None, count=0, re_flags=0)[source]#

Replace regular expression matches of pattern with replacement.

The pattern may be a string or a compiled regexp pattern object. Backreferences are allowed. The region can be set with start and end. If count > 0, specifies the maximum number of occurrences to be replaced.

The replacement argument can also be a funtion, which is then called with the match object and should return the replacement string.

With start and end the range can be specified, and, if the pattern was a string it is compiled to a regular expression object using the speficied re_flags.

trim(start=0, end=None)[source]#

Remove trialing whitespace in the specified region.

translate(mapping, start=0, end=None, count=0, whole_words=False)[source]#

Replace every occurrence of a key in mapping with its value.

If whole_words is True, only match the keys at word boundaries.

text_changed(position, removed, added)[source]#

Called after _update_text().

The default implementation does nothing.

class Document(text='', url=None, encoding=None)[source]#

Bases: AbstractDocument, MutableString, Observable

A basic Document with undo and modified status.

This Document implements AbstractDocument by holding the text in a hidden _text attribute. It adds support for undo/redo and has a modified() state.

It also inherits from Observable and emits the following events:

"text_change" (position, removed, added):

emitted with position, removed, added arguments whenever the text changes

"text_changed":

emitted directly afther the previous event, but without arguments

"modification_changed" (bool):

emitted when the modified() state changes; True means the document was modified

"undo_available" (bool):

emitted when the availability of undo() changes

"redo_available" (bool):

emitted when the availability of redo() changes.

undo_redo_enabled = True#
property modified#

Read or set whether the text is modified, happens automatically normally.

undo()[source]#

Undo the last modification.

redo()[source]#

Redo the last undone modification.

clear_undo_redo()[source]#

Clear the undo/redo stack.

can_undo()[source]#

Return True if undo is possible.

can_redo()[source]#

Return True if redo is possible.

text_changed(position, removed, added)[source]#

Called after _update_text() has been called.

The default implementation emits the "text_change" and "text_changed" events.

class AbstractTextRange(document, pos, end)[source]#

Bases: object

Base class for Cursor and Block.

The text range is denoted by the pos and end attributes.

Provides the comparison operators ==, !=, >, <, >=, <=, based on the pos attribute. The ranges must refer to the same Document.

pos#

the (start) position.

end#

the end position (for Cursor, this may be None).

document()[source]#

Return our document.

text()[source]#

Return text in this range.

token()[source]#

Convenience method returning the Token at our pos.

The Document must have the WorkerDocumentMixin class mixed in (i.e. have the token() method.

tokens()[source]#

Convenience method yielding all Tokens that are in or overlap this text range.

The Document must have the WorkerDocumentMixin class mixed in (i.e. have the get_root() method.

class Cursor(document, pos=0, end=-1)[source]#

Bases: AbstractTextRange

Describes a certain range (selection) in a Document.

You may change the pos and end attributes yourself. Both must be an integer, end may also be None, denoting the end of the document.

As long as you keep a reference to the Cursor, its positions are updated when the document changes. When text is inserted at pos, the position remains the same. But when text is inserted at the end of a cursor, the end position (if not None) moves along with the new text. E.g.:

>>> from parce.document import Document, Cursor
>>> d = Document('hi there, folks!')
>>> c = Cursor(d, 8, 8)
>>> with d:
...     d[8:8] = 'new text'
...
>>> c.pos, c.end
(8, 16)

You can also use a Cursor as key while editing a document:

>>> c = Cursor(d, 8, 8)
>>> with d:
...     d[c] = 'new text'

You cannot alter the document via the Cursor. All move and select methods return the cursor again, so they can be chained:

>>> c = Cursor(d).select_all()
>>> c.pos, c.end
(0, None)
block()[source]#

Return the Block our pos is in.

blocks()[source]#

Yield the Blocks from pos to end.

move_start_of_block()[source]#

Move pos and end to the start of the current block. Returns self.

move_end_of_block()[source]#

Move pos and end to the end of the current block. Returns self.

select(pos, end=-1)[source]#

Change pos and end in one go. End defaults to pos. Returns self.

select_all()[source]#

Set pos to 0 and end to None; selecting all text. Returns self.

select_none()[source]#

Set end to pos. Returns self.

selection()[source]#

Return the two-tuple (pos, end) denoting the selected range.

The end value is never None, it is set to the length of the document if the end attribute is None.

has_selection()[source]#

Return True if text is selected.

select_start_of_block()[source]#

Moves the selection pos to the beginning of the current line.

Returns self.

select_end_of_block()[source]#

Moves the selection end (if not None) to the end of its line.

Returns self.

lstrip(chars=None)[source]#

Move pos to the right, if specified characters can be skipped.

By default whitespace is skipped, like Python’s lstrip() string method. Returns self.

rstrip(chars=None)[source]#

Move end to the left, if specified characters can be skipped.

By default whitespace is skipped, like Python’s rstrip() string method. Returns self.

strip(chars=None)[source]#

Adjust pos and end, like Python’s strip() method. Returns self.

class Block(document, pos, end)[source]#

Bases: AbstractTextRange

Represents a single line (block) of text in the Document.

Block objects are separated by newlines in the Document, and are created by Document.find_block() or Cursor.block(), and the blocks() iterator of both Cursor and Document.

Unlike Cursor, Block objects do not update their position when the document is changed. You should use Blocks while iterating but throw them away after applying changes to a Document.

Blocks can be compared: blocks originating from the same document compare equal when they point to the same position. You can also use the <, <=, > and >= operators.

is_first()[source]#

True if this is the first block.

is_last()[source]#

True if this is the last block.

property block_number#

The number of this block in the document.

The first block has number 0.

next_block()[source]#

The next block if available.

previous_block()[source]#

The previous block if available.

tokens()[source]#

Convenience method returning a tuple with all Tokens that are in or overlap this block.

The Document must have the WorkerDocumentMixin class mixed in (i.e. have the get_root() method.