The registry module

Registry of language definitions.

Instead of importing language definitions directly, you can use a Registry to manage and find language definitions.

The registry stores the fully qualified name for a root lexicon, for example "parce.lang.css.Css.root". This qualified name should have at least 2 dots, to separate module name, class name and the name of the root lexicon.

Use find() to find a language definition by name, or suggest() to find a language definition for a particular file type. There is basic functionality to pick a language definition based on file name, mime type and/or the contents of the file.

Using the register() function it is possible to register your own language definitions at runtime and make them available through parce. As a service, the bundled languages in parce.lang are automatically registered in the global registry.

The global registry is in the registry module variable. You can also create and populate your own Registry.

class Item(name, desc, aliases, filenames, mimetypes, guesses)

Bases: tuple

property aliases

Alias for field number 2

property desc

Alias for field number 1

property filenames

Alias for field number 3

property guesses

Alias for field number 5

property mimetypes

Alias for field number 4

property name

Alias for field number 0

class Registry[source]

Bases: dict

Registry of language definitions.

The Registry is based on the Python dictionary class, and maps fully qualified lexicon names (such as "parce.lang.css.Css.root") to Item tuples.

register(lexicon_name, *, name, desc, aliases=[], filenames=[], mimetypes=[], guesses=[])[source]

Register or update a Language’s root lexicon for a particular filename (patterns), particular mime types or based on contents of the file.

The arguments:

lexicon_name

The fully qualified name of a root lexicon, e.g. "parce.lang.css.Css.root". Must contain at least two dots.

name

A human-readable name for the file type

desc

A short description

aliases

An optional list of other names this lexicon can be found under.

filenames

A list of tuples (pattern, weight). A pattern is a plain filename or a filename with globbing characters, e.g. "Makefile" or "*.c", and the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).

mimetypes

A list of tuples (mimetype, weight). A mimetype is a string like "text/css", the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).

guesses

A list of tuples (regexp, weight). The first 5000 characters of the contents are matched against the regular expression, and when it matches, the weight is added to the already computed weight for this root lexicon.

This method simply creates an Item tuple with all the arguments and stores it using the lexicon name as key.

suggest(filename=None, mimetype=None, contents=None)[source]

Return a list of registered language definitions, sorted on relevance.

The filename has the most weight, if two have the same weight, the mimetype is looked at; if still the same, the contents are looked at with some heuristic.

Every item in the returned list is the fully qualified name of the root lexicon, e.g. "parce.lang.css.Css.root".

find(name)[source]

Find a fully qualified lexicon name for the specified name.

First, tries to find the exact match on the name attribute, then the aliases, then a case insensitive match, and then the same for the Language class name.

register(lexicon_name, **kwargs)[source]

register() a lexicon in the global registry.

suggest(filename=None, mimetype=None, contents=None)[source]

suggest() zero or more lexicons from the global registry.

find(name)[source]

find() a lexicon by name from the global registry.

root_lexicon(lexicon_name)[source]

Import the module and return the root lexicon.

Eg, for the lexicon_name "parce.lang.css.Css.root" imports the parce.lang.css module and returns the Css.root lexicon.