The regex module

Utility module with functions to construct or manipulate regular expressions.


Convert the word list to an optimized regular expression.


>>> import parce.regex
>>> parce.regex.words2regexp(['opa', 'oma', 'mama', 'papa'])
>>> parce.regex.words2regexp(['car', 'cdr', 'caar', 'cadr', 'cdar', 'cddr'])

Return a string with adjacent characters grouped.


>>> parce.regex.make_charclass(('a', 'd', 'b', 'f', 'c'))

Supplying a string is also supported:

>>> parce.regex.make_charclass("abcdefghjklmnop")

Special characters are properly escaped.


Return (words, suffix), where suffix is the common suffix.

If there is no common suffix, words is returned unchanged, and suffix is an empty string. If there is a common suffix, that is chopped of the returned words. Example:

>>> parce.regex.common_suffix(['opa', 'oma', 'mama', 'papa'])
(['op', 'om', 'mam', 'pap'], 'a')

Convert an unambiguous regexp to a plain string.

If the regular expression is unambiguous and can be converted to a plain string, return it. Otherwise, None is returned.

The returned string can be used with "".find(), which would be faster than using Examples:

>>> parce.regex.to_string(r"a.e")
>>> parce.regex.to_string(r"a\.e")
>>> parce.regex.to_string(r"a\ne")

The first returns None, because the dot can match multiple characters.

make_trie(words, reverse=False)[source]

Return a dict-based radix trie structure from a list of words.

If reverse is set to True, the trie is made in backward direction, from the end of the words.

trie_to_regexp_tuple(node, reverse=False)[source]

Converts the trie node to a tuple of regular expression parts.

A part is either a plain string expression or a frozenset instance. A frozenset instance denotes a group of alternative expressions, and consists of plain string expressions or other tuples. If None is also present in the frozenset, the expression is optional.


Convert a tuple to a full regular expression pattern string.

The tuple is described in the trie_to_regexp_tuple() function doc string.