Transforming ============ .. currentmodule:: parce.transform The :mod:`~parce.transform` module provides infrastructure to *transform* a tree structure or a text to any datastructure you wish to create. The basic idea of transformation is simple: for every Context in a tree structure, a method of a Transform instance is called. The method has the same name as the context's lexicon, and is called with an :class:`ItemList` instance containing the list of children of that context. Sub-contexts in that list already have been replaced with the result of that context's lexicon's transformation method, wrapped in an :class:`Item`, so the :class:`ItemList` consists of instances of either :class:`~parce.tree.Token` or :class:`Item`. To make it easier to distinguish between the two, the Item class has an :attr:`~Item.is_token` class attribute, set to False. Thus, a Transform class can closely mimic a corresponding Language class. If you want to ignore the output of a particular lexicon, don't define a method with that name, but set its name to ``None`` in the Transform class definition. How it works ------------ The actual task of transformation (evaluation) is performed by a :class:`Transformer`. The Transformer has infrastructure to choose the Transform class based on the current Language. Using the :meth:`~Transformer.add_transform` method, you can assign a Transform instance to a Language class. There are two convenience functions :func:`transform_text` and :func:`transform_tree`. For example:: from parce import root, Language, lexicon, default_action from parce.action import Delimiter, Number, String from parce.transform import Transform, transform_text class MyLang(Language): @lexicon def root(cls): yield r'\[', Delimiter, cls.list yield r'\d+', Number yield r'"', String, cls.string @lexicon def list(cls): yield r'\]', Delimiter, -1 yield from cls.root @lexicon def string(cls): yield r'"', String, -1 yield default_action, String This language definition finds numbers, strings, and lists of those. We want to convert those to their Python equivalents. So, we create a corresponding Transform class, with methods having the same name as the lexicons in the Language definition:: class MyLangTransform(Transform): def root(self, items): result = [] for i in items: if i.is_token: if i.action is Number: result.append(int(i.text)) # a Number else: result.append(i.obj) # a list or string return result def list(self, items): return self.root(items) def string(self, items): return items[0].text # not the closing quote Now let's test our Transform! >>> transform_text(MyLang.root, '1 2 3 [4 "Q" 6] x 7 8 9') [1, 2, 3, [4, 'Q', 6], 7, 8, 9] It works! Note that the stray `x` is ignored, because it is not matched by any rule. The above function call is equivalent to:: >>> from parce.transform import Transformer >>> t = Transformer() >>> t.add_transform(MyLang, MyLangTransform()) >>> t.transform_text(MyLang.root, '1 2 3 [4 "Q" 6] x 7 8 9') [1, 2, 3, [4, 'Q', 6], 7, 8, 9] Transforming a tree structure ----------------------------- Using the same Transform class, you can also transform a tree structure:: >>> from parce.transform import transform_tree >>> tree = root(MyLang.root, '1 2 3 [4 "Q" 6] x 7 8 9') >>> tree.dump() ├╴ ├╴ ├╴ ├╴ ├╴ │ ├╴ │ ├╴ │ ├╴ │ │ ├╴ │ │ ╰╴ │ ├╴ │ ╰╴ ├╴ ├╴ ╰╴ >>> transform_tree(tree) [1, 2, 3, [4, 'Q', 6], 7, 8, 9] .. note:: Note that the :func:`transform_tree` gets the root lexicon from the root element, and then automatically finds the corresponding Transform class, if you didn't specify one yourself. This is done by looking in the same module as the root lexicon's language, and finding there a Transform subclass with the same name with ``"Transform"`` appended (see :meth:`Transformer.find_transform`). Examples of Transform classes can be found in the :mod:`~parce.css`, :mod:`~parce.lang.csv` and the :mod:`~parce.lang.json` modules. Calculator example ------------------ As a proof of concept, below is a simplistic calculator, it can be found in :file:`tests/calc.py`: .. literalinclude:: ../../tests/calc.py Test it with:: >>> from parce.transform import transform_text >>> from tests.calc import Calculator # (from source directory) >>> transform_text(Calculator.root, " 1 + 1 ") 2 >>> transform_text(Calculator.root, " 1 + 2 * 3 ") 7 >>> transform_text(Calculator.root, " 1 * 2 + 3 ") 5 >>> transform_text(Calculator.root, " (1 + 2) * 3 ") 9 Integration with TreeBuilder ---------------------------- It is easy to keep a transformed structure up-to-date when a tree changes. The Transformer caches the result of every transform method using a weak reference to the Context that yielded that result. So when modifications to a text are small, in most cases the Transformer is very quick with applying the necessary changes to the transformed result. When the TreeBuilder changes the tree, it emits the event ``"invalidate"`` with the youngest node that has its children changed (i.e. tokens or contexts were added or removed). The Transformer then knows that that context and all its ancestors need to be recomputed, and removes them from its cache. During transformation all newly added contexts are evaluated as well, because their transformations can't be found in the cache. .. note:: Contexts that only changed position are not recomputed. If you want your transformed structure to know the position in the text, you should store references to the corresponding tokens in your structure. The ``pos`` attribute of the Tokens that move is adjusted by the tree builder, so they still point to the right position after an update of the tree. When the tree builder is about to insert the modified tree part in the original tree, it emits the ``"replace"`` event. The transformer reacts by interrupting any current job that might be busy computing the transformed result. Finally, when the tree builder emits ``"finished"`` the transformer rebuilds our transformed result, using as much as possible the previously cached transform results for Contexts that did not change. A single Transformer can be used for multiple transformation jobs for multiple documents or tree builders, even at the same time. It shares the added Transform instances between multiple jobs and documents. If your Transform classes keep internal state that might not be desirable; in that case you can use a Transformer for every document or tree. One way to automatically run a Transformer from a TreeBuilder is using the :meth:`Transformer.connect_treebuilder` method, to setup all needed connections. Here is an example:: >>> from parce.lang.json import Json >>> from parce.treebuilder import TreeBuilder >>> from parce.transform import Transformer >>> >>> b = TreeBuilder(Json.root) >>> t = Transformer() >>> t.connect_treebuilder(b) >>> >>> b.rebuild('{"key": [1, 2, 3, 4, 5]}') >>> t.result(b.root) {'key': [1, 2, 3, 4, 5]} >>> b.rebuild('{"key": [1, 2, 3, 4, 5, 6, 7, 8]}', False, 22, 0, 9) >>> t.result(b.root) {'key': [1, 2, 3, 4, 5, 6, 7, 8]} The call to :meth:`TreeBuilder.rebuild() <.treebuilder.TreeBuilder.rebuild>` might seem overwhelming: we instruct to re-parse the text, starting at position 22 with 0 characters removed and 9 added. And now the transform is automatically updated. But, it is *much* easier to use the ``Document`` feature provided by *parce*, because that keeps track of the text and its modifications, and can automatically keep the tokenized tree and the transformed result up to date. So head on to the next chapter!