The registry module#
Registry of language definitions.
Instead of importing language definitions directly, you can use a Registry to manage and find language definitions.
The registry stores the fully qualified name for a root lexicon, for example
"parce.lang.css.Css.root"
. This qualified name should have at least 2 dots,
to separate module name, class name and the name of the root lexicon.
Using the register()
function it is possible to register your own
language definitions at runtime and make them available through parce.
As a service, the bundled languages in parce.lang
are automatically
registered in the global registry.
The global registry is in the registry
module variable.
You can also create and populate your own Registry
.
- class Entry(name, desc, section, author, aliases, filenames, mimetypes, guesses)#
Bases:
tuple
Used to store entries in the Registry dict, using the qualified name of the root lexicon as the key.
- aliases#
A list of other names this lexicon can be found under.
- author#
The author.
- desc#
A short description.
- filenames#
A list of tuples (pattern, weight). A pattern is a plain filename or a filename with globbing characters, e.g.
"Makefile"
or"*.c"
, and the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).
- guesses#
A list of tuples (regexp, weight). The first 5000 characters of the contents are matched against the regular expression, and when it matches, the weight is added to the already computed weight for this root lexicon.
- mimetypes#
A list of tuples (mimetype, weight). A mimetype is a string like
"text/css"
, the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).
- name#
A human-readable name for the file type.
- section#
The section, e.g. for grouped display in a menu. (If the section is empty, the entry needs not to be shown in a menu.)
- class Registry(fallback=None)[source]#
Bases:
dict
Registry of language definitions.
The
Registry
is based on the Python dictionary class, and maps fully qualified lexicon names (such as"parce.lang.css.Css.root"
) toEntry
tuples.You can specify another Registry as fallback on construction, or set the
fallback
attribute later. Thefind()
method uses this fallback, if set.- add(lexicon_name, *, name=None, desc=None, section='', author='', aliases=(), filenames=(), mimetypes=(), guesses=(), inherit=None)[source]#
Register or update a Language’s root lexicon for a particular filename (patterns), particular mime types or based on contents of the file.
The arguments:
lexicon_name
The fully qualified name of a root lexicon, e.g.
"parce.lang.css.Css.root"
. Must contain at least two dots.name
A human-readable name for the file type (required unless
inherit
is set)desc
A short description (required unless
inherit
is set)aliases
An optional list of other names this lexicon can be found under.
filenames
A list of tuples (pattern, weight). A pattern is a plain filename or a filename with globbing characters, e.g.
"Makefile"
or"*.c"
, and the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).mimetypes
A list of tuples (mimetype, weight). A mimetype is a string like
"text/css"
, the weight is a floating point value indicating the probability that the root lexicon should be chosen for this filename (0..1 range).guesses
A list of tuples (regexp, weight). The first 5000 characters of the contents are matched against the regular expression, and when it matches, the weight is added to the already computed weight for this root lexicon.
inherit
If not None, it should be the fully qualified name of an existing lexicon. The corresponding entry is then removed and merged with the current entry. Using a non-existing entry raises a
KeyError
.
This method simply creates an
Entry
tuple with all the arguments and stores it using the lexicon name as key.
- suggest(filename=None, mimetype=None, contents=None)[source]#
Return a list of registered language definitions, sorted on relevance.
The filename has the most weight, if two have the same weight, the mimetype is looked at; if still the same, the contents are looked at with some heuristic.
Every item in the returned list is the fully qualified name of the root lexicon, e.g.
"parce.lang.css.Css.root"
.
- qualname(name)[source]#
Find a fully qualified lexicon name for the specified name.
First, tries to find the exact match on the
name
attribute, then the aliases, then a case insensitive match, and then the same for the Language class name.
- static lexicon(qualname)[source]#
Import the module and return the actual lexicon.
Eg, for the fully qualified
qualname
"parce.lang.css.Css.root"
, imports theparce.lang.css
module and returns theCss.root
lexicon.
- find(name=None, filename=None, mimetype=None, contents=None)[source]#
Convenience method to find a root lexicon, either by language name, or by filename, mimetype and/or contents.
If you specify a name, tries to find the language with that name (using
qualname()
), ignoring the other arguments.If you don’t specify a name, but instead one or more of the other arguments, tries to find the language based on filename, mimetype or contents (using
suggest()
).If a language is found, returns the root lexicon (using
lexicon()
). If no language could be found, the fallback registry is consulted, if set. Ultimately, None is returned (which can also be used as root lexicon, resulting in an empty token tree).Examples:
>>> from parce.registry import registry as r >>> r.find("xml") Xml.root >>> r.find(contents='{"key": 123;}') Json.root >>> r.find(filename="style.css") Css.root
- register(lexicon_name, **kwargs)[source]#
Register a lexicon in the global registry.
For all the arguments, see
Registry.add()
.