Configuration Models

I have been noodling on the best general purpose application configuration model, and mechanism, I can find lately – that is, trying to find something general purpose that I don’t think is miserably bad. Specifically, configuration of heterogeneous applications in a hosted/SaaS/distributed/blah type system.

I’ve worked with all kinds of stuff – not everything under the sun, for sure, but quite a few models. For purposes of this discussion, I’ll start with the one I have been working with most closely for the last six years – Ning’s Galaxy Repository.

Galaxy Classic: The Discussion Baseline

Galaxy uses a hierarchical configuration repository. See the “Gepo” and “Config Path” parts of the readme for a description of how it works. The mechanism does work pretty well, it’s gotten us a long way, but it has some horrible warts that I would like to avoid. (Really, go read that page if you are not familiar with Galaxy and want any of the rest of this to make sense. It’s worth it, Galaxy has been a Force for Good).

The first is that the galaxy agent relies on the configuration repository to determine the binary to install. It takes the configuration path for the deployment (I told you, just go read that docu and come back) and constructs a URL for the binary.

The second is that after finding the binary in the config repo, it then passes information about this config repo to the deployment bundle when it deploys it. The agent has a hard dependency on the config repo, and it leaks the dependency into the deployment bundle, which then uses its knowledge of the configuration repository to pull down the application configuration.

Proofpoint’s Galaxy: Another Take

Some very good folks at Proofpoint, who all worked with (and in Martin’s case, helped build) Ning’s Galaxy reimplemented it to fix a number of behaviors that were suitable at Ning, but not Proofpoint. One of the changes was to the configuration mechanism. The deployment command receives a set of coordinates for the binary and a configuration. These coordinates are converted to two URLS by the time they reach the agent.

The configuration URL, in the Proofpoint system, references a resource which is a set of (path, URL) pairs. When the agent deploys the binary bundle, it then pulls down each resource specified in the configuration resource, and puts it at the path specified for it in the deployment.

In this model, the agent just receives URLs, though it understands the resources those URLs point to. The binary one because it is a galaxy package, and deploying them is what it does, the configuration one as it needs to pull down the configuration files.

The Proofpointers are considering reworking that configuation mechanism to be a URL to a configuration tarball which will contain all of the config files needed, which is expanded to a known location (probably /env/) inside the deployment. This would remove one layer of indirection, and makes the configuration a write-once artifact.

Sculptor: Yet Another Take

In parallel, I have been experimenting with another galaxy implementation, in particular to play nicely with Atlas, named Sculptor.

Sculptor also uses (path, URL) pairs, but these pairs are specified as either part of the environment (so properties of the agent) and/or part of the deployment. You can watch a screencast which uses the environment configuration, but the deployment side is just in the noodling stages. In sculptor a deployment resource (submitted to the agent) would look something like:

{
    "url":"http://static.skife.org/echo-0.0.2.tar.gz",
    "name":"Echo Server",
    "configuration": {
      "/env/instance.conf":"http://waffles/DEP-123/echo/echo.conf",
      "/env/something.else":"http://pancakes/master/global.config"
    }
}

This is remarkably similar to the Proofpoint model, but was arrived at bottom up to play nicely with how Atlas works, and be able to support Ning’s applications. I like that they evolved in similar directions.

Using Configuration

In all of these cases, the configuration tends to be very simple key value pairs. Application specific configuration files, such as httpd.conf, usually have values interpolated from the key value pairs coming from configuration. In Ning’s case, this is at deployment time, but in Proofpoint and Sculptor, this is when the service is started.

Under any of these models you can put application-formatted configuration in the repository (and referenced by URL) but the application specific format tends to be stringly tied to the implementation details of the service, and spreading that knowledge around to different artifacts generally makes things more difficult.

Help!

To return to the beginning I want to pick one general purpose model. Sculptor embraces where my mind is right now, but it is an untested model – no one uses Sculptor (yet) so its mechanism and details are unproven. I’d really appreciate feedback, thoughts, and war stories on this stuff. Thank you!