What is Galaxy?

At Ning, in 2006, Martin and I wrote a deployment tool called Galaxy. Since that time I know of at least three complete reimplementations, two major forks, and half a dozen more partial reimplementations. In a bizarre twist of fate, I learned yesterday from Adrian that my friend James also has a clean room implementation. Using Fabric called Process Manager. Holy shit.

Beyond reimplementations and forks from ex-Ninglets who are using a Galaxy derivitive, I frequently hear from ex-Ninglets who are not and wish they could. We clearly got something right, it seems. Fascinatingly, folks all seem to focus on different aspects of Galaxy in terms of what they love about it. They also tend to have a common set of complaints about how Ning’s version worked, and have adapted theirs to accomodate them.

To me, the heart of Galaxy is the concept of the galaxy bundle, a tarball with the application and its dependencies coupled with an RC script at a known location inside the bundle. Given such a bundle, a Galaxy implementation is then the tooling for deploying and managing those bundles across a set of servers. From personal, and second hand, experience this simple setup can keep things happy well into the thousands of servers.

To many others, the heart of Galaxy seems to be the tooling itself, and the fairly nice way of managing applications seperately from servers. At least one major user even ignores the idea of putting the applications and their dependencies in the bundle, and uses Galaxy to install RPMs! (I personally think this approach is not so great, but the person doing it is one of the best engineers I know, so am happy to believe I may be wrong).

Different folks have also drawn the line of what the Galaxy implementation should manage in quite different places. In the orginal implementation, Galaxy included bundle and configuration repositories, along with how those repos were structured, an agent to control the application on the server, a console to keep track of it all, and a command line tool to query and take actions on the system. On the other hand, the Proofpoint/Airlift implementation weakens the contracts on configuration (in a good way), requires a Maven repository for bundles, supports an arbitrary number of applications per host, and has Galaxy handle server provisioning as well as application deployment. The Ness and, I believe, Metamarkets, implementation changes the configuration contract significantly, also supports several applications per host, and includes much more local server state in what Galaxy itself manages.

The other (generally minor) implementations and experiments have taken it in quite a few different directions, ranging from Pierre’s reimplementation using Erlang and CouchDB, to my reimplementation with no agents or console.

There seems to be an awful lot of experimentation around the concepts in Galaxy, which is awesome! Unfortunately, only the original implementation is very well documented at this point, so it is tough to use Galaxy unless you have used it before (hence my shock at James even knowing about it). I guess it’s time to start documenting and try to save other folks some work!