The document module#
Document and Cursor form the basis of handling of documents in the parce package.
A Document contains a text string that is mutable via item and slice methods.
If you make modifications while inside a context (using the Python context manager protocol), the modifications are only applied when the context exits for the last time.
For tokenized documents (see parce.Document
), parce inherits from this
base class (see the work
module).
You can use a Cursor to keep track of positions in a document. The position (and selection) of a Cursor is adjusted when the text in the document is changed.
You can use the various find_block()
and blocks()
methods to iterate
over a Document on a line-by-line basis.
- class AbstractDocument(text='', url=None, encoding=None)[source]#
Bases:
AbstractMutableString
Base class for a Document.
A Document is like a mutable string, but understands
Cursor
andBlock
.- modified = False#
Whether this document is modified
- block_separator = '\n'#
separator to use for block boundaries (newline)
- url = None#
can be set to the url this document is loaded from
- encoding = None#
can be set to the encoding used to read/write this document
- revision()[source]#
Return the revision number.
This number is incremented by one on every document change.
- find_block(position)[source]#
Return a
Block
representing the text line (block) at position.A position larger than the document’s length just returns the last block. (A document has always at least one block).
- find_block_by_number(number)[source]#
Return the
Block
for text linenumber
.The first block has number 0. Returns None when the document has less blocks than the specified number. Negative numbers count backwards from the end.
Avoid this method and
block_count()
where you can, they are potentially expensive for large documents. Preferfind_block()
andBlock.next_block()
orBlock.previous_block()
for iteration.
- block_count()[source]#
Return the number of blocks (lines) in this document.
This counts the number of occurrences of
block_separator
in the full text, incremented with 1. A document has always at least one block.Avoid this method and
find_block_by_number()
where you can, they are potentially expensive for large documents. Preferfind_block()
andBlock.next_block()
orBlock.previous_block()
for iteration.
- blocks(start=0, end=None)[source]#
Yield Blocks, starting at position start, ending at end.
Start defaults to 0, end to None, which means iterate to the last block.
- replace(old, new, start=0, end=None, count=0)[source]#
Replace occurrences of old with new in region start->end.
If count > 0, specifies the maximum number of occurrences to be replaced.
- re_sub(pattern, replacement, start=0, end=None, count=0, re_flags=0)[source]#
Replace regular expression matches of pattern with replacement.
The pattern may be a string or a compiled regexp pattern object. Backreferences are allowed. The region can be set with start and end. If count > 0, specifies the maximum number of occurrences to be replaced.
The replacement argument can also be a funtion, which is then called with the match object and should return the replacement string.
With start and end the range can be specified, and, if the pattern was a string it is compiled to a regular expression object using the speficied re_flags.
- class Document(text='', url=None, encoding=None)[source]#
Bases:
AbstractDocument
,MutableString
,Observable
A basic Document with undo and modified status.
This Document implements
AbstractDocument
by holding the text in a hidden _text attribute. It adds support for undo/redo and has amodified()
state.It also inherits from
Observable
and emits the following events:"text_change" (position, removed, added)
:emitted with
position
,removed
,added
arguments whenever the text changes"text_changed"
:emitted directly afther the previous event, but without arguments
"modification_changed" (bool)
:emitted when the
modified()
state changes; True means the document was modified"undo_available" (bool)
:emitted when the availability of
undo()
changes"redo_available" (bool)
:emitted when the availability of
redo()
changes.
- undo_redo_enabled = True#
- property modified#
Read or set whether the text is modified, happens automatically normally.
- class AbstractTextRange(document, pos, end)[source]#
Bases:
object
Base class for
Cursor
andBlock
.The text range is denoted by the
pos
andend
attributes.Provides the comparison operators
==
,!=
,>
,<
,>=
,<=
, based on thepos
attribute. The ranges must refer to the same Document.- pos#
the (start) position.
- end#
the end position (for Cursor, this may be None).
- token()[source]#
Convenience method returning the
Token
at our pos.The Document must have the
WorkerDocumentMixin
class mixed in (i.e. have thetoken()
method.
- tokens()[source]#
Convenience method yielding all Tokens that are in or overlap this text range.
The Document must have the
WorkerDocumentMixin
class mixed in (i.e. have theget_root()
method.
- class Cursor(document, pos=0, end=-1)[source]#
Bases:
AbstractTextRange
Describes a certain range (selection) in a
Document
.You may change the
pos
andend
attributes yourself. Both must be an integer, end may also be None, denoting the end of the document.As long as you keep a reference to the Cursor, its positions are updated when the document changes. When text is inserted at
pos
, the position remains the same. But when text is inserted at the end of a cursor, theend
position (if not None) moves along with the new text. E.g.:>>> from parce.document import Document, Cursor >>> d = Document('hi there, folks!') >>> c = Cursor(d, 8, 8) >>> with d: ... d[8:8] = 'new text' ... >>> c.pos, c.end (8, 16)
You can also use a Cursor as key while editing a document:
>>> c = Cursor(d, 8, 8) >>> with d: ... d[c] = 'new text'
You cannot alter the document via the Cursor. All move and select methods return the cursor again, so they can be chained:
>>> c = Cursor(d).select_all() >>> c.pos, c.end (0, None)
- selection()[source]#
Return the two-tuple (pos, end) denoting the selected range.
The
end
value is never None, it is set to the length of the document if theend
attribute is None.
- select_start_of_block()[source]#
Moves the selection pos to the beginning of the current line.
Returns self.
- select_end_of_block()[source]#
Moves the selection end (if not None) to the end of its line.
Returns self.
- lstrip(chars=None)[source]#
Move pos to the right, if specified characters can be skipped.
By default whitespace is skipped, like Python’s lstrip() string method. Returns self.
- class Block(document, pos, end)[source]#
Bases:
AbstractTextRange
Represents a single line (block) of text in the
Document
.Block objects are separated by newlines in the Document, and are created by Document.find_block() or Cursor.block(), and the blocks() iterator of both Cursor and Document.
Unlike
Cursor
, Block objects do not update their position when the document is changed. You should use Blocks while iterating but throw them away after applying changes to a Document.Blocks can be compared: blocks originating from the same document compare equal when they point to the same position. You can also use the
<
,<=
,>
and>=
operators.- property block_number#
The number of this block in the document.
The first block has number 0.
- tokens()[source]#
Convenience method returning a tuple with all Tokens that are in or overlap this block.
The Document must have the
WorkerDocumentMixin
class mixed in (i.e. have theget_root()
method.