The registry module

Registry of language definitions.

Instead of importing language definitions directly, you can use a Registry to manage and find language definitions.

The registry stores the fully qualified name for a root lexicon, for example "parce.lang.css.Css.root". This qualified name should have at least 2 dots, to separate module name, class name and the name of the root lexicon.

Using the register() function it is possible to register your own language definitions at runtime and make them available through parce. As a service, the bundled languages in parce.lang are automatically registered in the global registry.

The global registry is in the registry module variable. You can also create and populate your own Registry.

registry

The global default parce Registry.

class Entry(name, desc, section, author, aliases, filenames, mimetypes, guesses)

Bases: tuple

Used to store entries in the Registry dict, using the qualified name of the root lexicon as the key.

aliases

A list of other names this lexicon can be found under.

author

The author.

desc

A short description.

filenames

A list of tuples (pattern, weight). A pattern is a plain filename or a filename with globbing characters, e.g. "Makefile" or "*.c", and the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).

guesses

A list of tuples (regexp, weight). The first 5000 characters of the contents are matched against the regular expression, and when it matches, the weight is added to the already computed weight for this root lexicon.

mimetypes

A list of tuples (mimetype, weight). A mimetype is a string like "text/css", the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).

name

A human-readable name for the file type.

section

The section, e.g. for grouped display in a menu. (If the section is empty, the entry needs not to be shown in a menu.)

class Registry(fallback=None)[source]

Bases: dict

Registry of language definitions.

The Registry is based on the Python dictionary class, and maps fully qualified lexicon names (such as "parce.lang.css.Css.root") to Entry tuples.

You can specify another Registry as fallback on construction, or set the fallback attribute later. The find() method uses this fallback, if set.

fallback = None

Another Registry the find() method can use.

copy()[source]

Return a copy of this Registry. Any fallback is reused, not copied.

add(lexicon_name, *, name=None, desc=None, section='', author='', aliases=(), filenames=(), mimetypes=(), guesses=(), inherit=None)[source]

Register or update a Language’s root lexicon for a particular filename (patterns), particular mime types or based on contents of the file.

The arguments:

lexicon_name

The fully qualified name of a root lexicon, e.g. "parce.lang.css.Css.root". Must contain at least two dots.

name

A human-readable name for the file type (required unless inherit is set)

desc

A short description (required unless inherit is set)

aliases

An optional list of other names this lexicon can be found under.

filenames

A list of tuples (pattern, weight). A pattern is a plain filename or a filename with globbing characters, e.g. "Makefile" or "*.c", and the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).

mimetypes

A list of tuples (mimetype, weight). A mimetype is a string like "text/css", the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).

guesses

A list of tuples (regexp, weight). The first 5000 characters of the contents are matched against the regular expression, and when it matches, the weight is added to the already computed weight for this root lexicon.

inherit

If not None, it should be the fully qualified name of an existing lexicon. The corresponding entry is then removed and merged with the current entry. Using a non-existing entry raises a KeyError.

This method simply creates an Entry tuple with all the arguments and stores it using the lexicon name as key.

suggest(filename=None, mimetype=None, contents=None)[source]

Return a list of registered language definitions, sorted on relevance.

The filename has the most weight, if two have the same weight, the mimetype is looked at; if still the same, the contents are looked at with some heuristic.

Every item in the returned list is the fully qualified name of the root lexicon, e.g. "parce.lang.css.Css.root".

qualname(name)[source]

Find a fully qualified lexicon name for the specified name.

First, tries to find the exact match on the name attribute, then the aliases, then a case insensitive match, and then the same for the Language class name.

static lexicon(qualname)[source]

Import the module and return the actual lexicon.

Eg, for the fully qualified qualname "parce.lang.css.Css.root", imports the parce.lang.css module and returns the Css.root lexicon.

find(name=None, filename=None, mimetype=None, contents=None)[source]

Convenience method to find a root lexicon, either by language name, or by filename, mimetype and/or contents.

If you specify a name, tries to find the language with that name (using qualname()), ignoring the other arguments.

If you don’t specify a name, but instead one or more of the other arguments, tries to find the language based on filename, mimetype or contents (using suggest()).

If a language is found, returns the root lexicon (using lexicon()). If no language could be found, the fallback registry is consulted, if set. Ultimately, None is returned (which can also be used as root lexicon, resulting in an empty token tree).

Examples:

>>> from parce.registry import registry as r
>>> r.find("xml")
Xml.root
>>> r.find(contents='{"key": 123;}')
Json.root
>>> r.find(filename="style.css")
Css.root
by_section()[source]

Return a dictionary mapping section name to a dict with all entries in that section.

register(lexicon_name, **kwargs)[source]

Register a lexicon in the global registry.

For all the arguments, see Registry.add().