The regex module

Utility module with functions to construct or manipulate regular expressions.

words2regexp(words)[source]

Convert the word list to an optimized regular expression.

Example:

>>> import parce.regex
>>> parce.regex.words2regexp(['opa', 'oma', 'mama', 'papa'])
'(?:mam|pap|o[mp])a'
>>> parce.regex.words2regexp(['car', 'cdr', 'caar', 'cadr', 'cdar', 'cddr'])
'c[ad]{1,2}r'
make_charclass(chars)[source]

Return a string with adjacent characters grouped.

Example:

>>> parce.regex.make_charclass(('a', 'd', 'b', 'f', 'c'))
'a-df'

Supplying a string is also supported:

>>> parce.regex.make_charclass("abcdefghjklmnop")
'a-hj-p'

Special characters are properly escaped.

common_suffix(words)[source]

Return (words, suffix), where suffix is the common suffix.

If there is no common suffix, words is returned unchanged, and suffix is an empty string. If there is a common suffix, that is chopped of the returned words. Example:

>>> parce.regex.common_suffix(['opa', 'oma', 'mama', 'papa'])
(['op', 'om', 'mam', 'pap'], 'a')
to_string(expr)[source]

Convert an unambiguous regexp to a plain string.

If the regular expression is unambiguous and can be converted to a plain string, return it. Otherwise, None is returned.

The returned string can be used with "".find(), which would be faster than using re.search(). Examples:

>>> parce.regex.to_string(r"a.e")
>>> parce.regex.to_string(r"a\.e")
'a.e'
>>> parce.regex.to_string(r"a\ne")
'a\ne'

The first returns None, because the dot can match multiple characters.

make_trie(words, reverse=False)[source]

Return a dict-based radix trie structure from a list of words.

If reverse is set to True, the trie is made in backward direction, from the end of the words.

trie_to_regexp_tuple(node, reverse=False)[source]

Converts the trie node to a tuple of regular expression parts.

A part is either a plain string expression or a frozenset instance. A frozenset instance denotes a group of alternative expressions, and consists of plain string expressions or other tuples. If None is also present in the frozenset, the expression is optional.

build_regexp(r)[source]

Convert a tuple to a full regular expression pattern string.

The tuple is described in the trie_to_regexp_tuple() function doc string.