Rethinking Some Galaxy Core Assumptions

Galaxy has been very successful, in multiple companies, but I think it can actually be simplified and made more powerful. The first touches on the heart of Galaxy, the second on the often-argued configuration aspect.

RC Scripts

I described the heart of Galaxy as a tarball with an rc script. I think it likely that the rc script should give way to a simple run script. The difference is that the run script doesn’t daemonize – it simply keeps running. An implementation will probably need to have a controller process which does daemonize (or defer to upstart or its ilk for process management).

While writing an rc script is far from rocket surgery, it turns out that the nuances are annoying to implement again, and again, and again. The main nuance is daemonizing things correctly. I’d prefer to provide that for software rather then force applications to get it right. Many app servers handle daemonization well, but they also all (that I know of, anyway) provide a mechanism for running in the foreground as well.

Unfortunately, a run script model makes a no-install implementation much trickier. The lore on daemonizing from bash is tricky, but even assuming bash is tricky. Using something like daemonize is nice, but then it requires an installation. Grrr. This is an implemenation problem though, and requiring some kind of installation on the appserver side may be worth it for simplifying the model.

Configuration

In a moment of blinding DUH, I came back to environment variables for environmental information. I mean, it works for everyone on Heroku or Cloud Foundry.

There has been a trend in Galaxy implementations (and elsewhere) to use purely file based configuration. This is great for application configuration, but is meh for environmental configuration. This has lead to most Galaxy implementation supporting some model of URL to Path mapping, for placing configuration files into deployed applications. These mechanisms are a great way to provide escape hatch/override configuration but plays against the goal of making deployments self contained, which I like. This punts on going all the way, which Dan likes to advocate, to putting environment information into the deploy bundle, but I am not sold on this myself :-)

Regardless, a general purpose implementation probably needs to support both env var configuration and file based, but you can certainly recommend one way of making use of it.