Providence Utils: Config
Providence config is a special config format for generating providence message objects. The reason to have a special config format is to solve the "structured + modular" problem. If you're not familiar here is a short wrap-up.
The Problem of Strongly Typed Structured Config
When choosing a config language (or config markup language), you need to consider what you use the config for. But we would like to have four main properties:
schema-defined
: The config has a strict schema that can be used to validate it, e.g. in testing pipelines. But the schema needs to be at least as modular as the config itself, and possible to define in a remote / distributed fashion, e.g. different git repositories contains different parts of the schema.structured
: The config must have structure that can be utilized to group and pass a part of the whole. E.g. if you need 3 database connections, but each should just produce a DBI instance, that is an example of a good use for structured config: Each has a part of the config that is internally identical configuring that DB connection.type-safe
: The code that uses the config should be type-safe, so I know at compile-time if the value I look at is an integer or a string. This is also a way to ensure that config values must be defined in the schema before use.modular
: The config must support being split into multiple files that can be combined or merged into a single "config" while respecting the "type" of the config items. The modular part has two variants; include and extend. Included config is something that is included as a part of the parsed config, and extended config is modifications on the same structure.
In addition we want to avoid a couple of misfeatures that make it difficult to
follow what is going on with the config: arithmetics
and scripting
. If you
need logic to generate your config, you should do that in a proper programming
or scripting language, and not bake it into the config itself. But if you
really need scripting, you should use java
, groovy
or similar.
- PS: This is one of the pitfalls of the
bcl
/gcl
configuration language of Google (see here for example), which gives full arithmetic and scripting control to the config writer, which in turn had a tendency to make the config files almost unreadable and a pretty unpopular task of managing.
If you want to have a schema-defined, structured, type-safe config, you can
choose thrift
, protobuf
or providence
serialized formats. All of these
fills 1.
, 2.
and 3.
natively. And they all support a "simple" modularization
by merging or "overloading" messages. Ignoring the "readable and writable config
format" part, there is the problem of modularizing a schema-defined system like
that: Where is the definition for what module, and how do you merge them?
And here lies the whole reason for providence-config
.
If you want to have a config that is type-safe both in the written config file input to the parser, to the code that uses that config, using a model builder like providence should be a no-brainer. But when the config is this strictly defined, making it truly modular becomes a non-trivial problem: We don't want to simply have a single central definition of "the config", but to build up the config for each application by modules used by the various parts that it contains: its libraries and local code.
This is the providence
project, so guess what, it is the base definition.
And since providence supports modular schema definitions, its config system
needs to support modularity of the config in the same way.
Config compatibility
Providence is by its very definition backward compatible, and if reading in non-strict modes, it is also forward compatible. Forward compatible means that you can add new fields and references, and update the config files with content from those fields without breaking older users of the same config files.
Config file Syntax
In general the config files use the suffix .cfg
, but .pvd
should be fine
too. In practice it should be irrelevant. But here is an overview over the
providence config file syntax:
- Comments follow the 'shell comment' syntax. Starts with a
#
, and ends in a newline. In that line, everything is allowed. The comment can start anywhere except inside a string literal. - The config has three parts, which must come in this order, where the two
first are optional.
- The
includes
: Other config files included with an alias, so they can be referenced from the config. - The
defines
: A set of values that can also be referenced in the config. - The
message
: The 'content' of the config itself.
- The
- The
includes
section is a set of recursively included config files. Each file is given analias
. E.g.include "other.cfg" as o
will make the 'o' reference point to the content of the "other.cfg" config. Files referenced in the include statements MUST be relative to the directory of the including file. - The
defines
follow a simple 'map' syntax where the key must be a simple identifier/[_a-zA-Z][_a-zA-Z0-9]/
, and a simple value (number, string, enum, boolean). The params value may be a declared (known) enum, with the double qualified identifier syntaxpackage.Name.VALUE
. - The
message
is a providence message, and is declared with the qualified typename (package.Name), and the content, following this syntax:TYPENAME (':' EXTEND)? '{' FIELD_VALUE* '}'
, where theFIELD_VALUE
part follows the 'pretty' serializer syntax (using the '=' field-value separator), with added support for references for values. The references can only reference params or content from imported config files. Messages has four specific modes of value specification, specified with what comes after the field name.- '=' means "overwrite with", otherwise the parent (extended) message values will be used as the base message.
- After the '=' there can be an optional reference to use as the base message instead of the default one.
- The extension content is delimited by the '{' and '}' chars.
- One of the reference or extend content must be present.
A note on field values
Note that both messages and maps can be extended, but lists and sets can not
(yet at least). This is because managing lists and sets is a bit more
complicated in the form of how to make modifications explicit, truly visible
and not confusing. E.g. if you need to remove a single element from a list e.g.
with slice()
, there is no way of showing which element is actually removed
without referencing it with .remove(value)
, and with syntax like that we get
into the whole world of "scripting".
Example of config syntax:
include "filepath" as alias
def {
name1 = "value"
number = 12345.6789
}
def other_num = 4321
def alias = package.Struct {
key = "value"
}
package.Struct : alias {
key = name1
key2 = 321
# Extending the existing message
substruct {
sub = "value"
}
# Overwriting with a new message
substruct = {
sub = "value"
}
# Replacing with a reference
substruct = alias.sub2
# Replacing with a reference that is extended.
substruct = alias.sub2 {
sub = "value"
}
}
And using the command, output matching the pretty-print output from providence.
$ pvdcfg -I . print myfile.cfg
{
key = "value"
key2 = 321
substruct = {
sub = "value"
}
}
Note that when using the resolveConfig("name", parent)
method, the config
system does not support that the main struct in the config inherits (or
extends) anything itself, and the config system will not keep a reference to
the config supplier, as it does not really know if this is the same inherited
or a different inherited config compared to other resolveConfig
calls for the
same file. In this case the config supplier must be cached by the caller, who
probably only needs to set this up once at program startup.
Java Interface
The interface for using this config in code should be fairly easy to use. In order to read a simple config structure, you can use a code snippet like this:
class Loader {
public Named load() {
WritableTypeRegistry reg = new SimpleTypeRegistry();
// sadly all types needs to be registered, so a utility to register all
// subtypes are needed.
reg.registerRecursive(Named.kDescriptor);
reg.registerRecursive(From.kDescriptor);
ProvidenceConfig cfg = new ProvidenceConfig(reg);
return cfg.getConfig("myfile.cfg");
}
}
Includes
Files referenced in the include statements must be relative to the directory of the including file.
Advanced Usage
It is possible to get more out of the configs by handling the config suppliers directly. This enables the program to react to config updates, and to always have the latest version of the config available.
class Program implements ConfigListener<Service,Service._Field> {
ProvidenceConfig providenceConfig;
Service service;
public Program() {
WritableTypeRegistry reg = new SimpleTypeRegistry();
// sadly all types needs to be registered, so a utility to register all
// subtypes are needed.
reg.registerRecursive(Named.kDescriptor);
reg.registerRecursive(From.kDescriptor);
reg.registerRecursive(Service.kDescriptor);
this.providenceConfig = new ProvidenceConfig(reg);
ConfigSupplier<Service,Service._Field> serviceSupplier =
providenceConfig.resolveConfig("my_service.cfg");
serviceSupplier.addListener(this);
this.service = serviceSupplier.get();
}
@Override
public void onConfigChange(@Nonnull Service update) {
this.service = update;
// and react to the actual changes...
}
}
There are also other config suppliers available to make the providence config system more powerful.
- [FixedConfigSupplier]: Just provides some config message as a config supplier. Will never change, and never trigger config listeners.
- [ResourceConfigSupplier]: Loads a system resource and provides it as a config supplier. The config never changes, and never triggers config listeners.
- [ReferenceConfigSupplier]: Uses a parent config and finds a reference (contained) message within the parent using a reference path. the path is the '.' concatenation of the field names. This supplier will forward changes in the parent config, but will not check for local changes.
- [OverrideConfigSupplier]: Takes a parent config and overrides it with
values based on an override value map. Can only override "leaf" values, not
whole messages. Uses the same reference path as the
ReferenceConfigSupplier
, and tries as best it can to parse the string value given. Handy to be able to override some values based on command line args or similar.
And in addition a config supplier meant to be used in testing called
TestConfigSupplier
. It exposes a testUpdate
method that triggers updates
the same way as the other updating configs.