Some JDBI 3 Noodling

I grabbed the most recent build of jdk8 w/ lambdas tonight and started noodling on jdbi 3, which will require Java 8.

Set<Something> things = jdbi.withHandle(h -> {
    h.execute("insert into something (id, name) values (?, ?)", 1, "Brian");
    h.execute("insert into something (id, name) values (?, ?)", 2, "Steven");

    return h.query("select id, name from something")
            .map(rs -> new Something(rs.getInt(1), rs.getString(2)))
            .into(new HashSet<Something>());
});

assertThat(things).isEqualTo(ImmutableSet.of(new Something(1, "Brian"),
                                             new Something(2, "Steven")));

The Stream interface is kind of heavy to implement as it stands right now, and I couldn’t get IDEA 12 and the JDK to agree on valid syntax. Neither one wants to let me omit the <Something> in the .into(new HashSet<Something>()); line, which the most recent State of the Collections implies I should be able to.

It would be really nice if the lambda syntax sugar would quietly drop return values when it is auto-converting to a Block without the { ... } – I had to make some things accept a Function rather then a Block even though I ignore the return value, this will then bite you when you use something that doesn’t have a return value. Java has side effects, sometimes we call a function which returns a value just for the side effects.

All told, I like the changes so far quite a bit despite my quibbles :-)


Go is PHP for the Backend

I’ve had the opportunity to use Go for my most recent project at work. The stuff I’ve done in Go is a minimally distributed system (two types of servers, tens of instances max) optimized for byte slinging throughput. The relationship with Go started out a bit rocky but got turned around.

After using it for a couple weeks, I described Go to my friend David as “PHP for the backend.” Despite my pretty low opinion of PHP, this was intended as a compliment. Regardless of the quality of the execution of PHP, the intent seems to have been to get out of your way and make building web pages easy. Go feels like that but for services. PHP is horribly inconsistent, breaks all the rules about programming language design, and is infuriating. Despite all that, it’s still the most widely used language for building web apps.

Go is rather similar – it is inconsistent, ignores anything a modern programming language is supposed to include, doesn’t use whitespace, except to disallow it in reasonable places (say, as a newline before a {). It offers nice first class functions, but then cripples them by having a strong type system which seems to ignore everything that has been done with type systems for the last couple decades. You cannot even write a proper map(...) because Go is strongly typed with no type parameterization. Go really wants to use tabs.

To top it off, errors are return values. They are return values which are easy to ignore. Idiommatic Go is to have several lines of boilerplate after every single function invocation which can possibly fail.

I got really annoyed, flamed Go on Twitter, and went for a walk. When I came back, several friends, in particular Toby had commented in IM about my issues, pointing out ways of trying to handle what I was being annoyed by. They were all very reasonable, but basically came down to something along the lines of, “Go doesn’t do what you are trying to do; there are some brutal hacks to approximate it, like how you do functional-ish programming in Java, but your are fighting the system.”

Calmed down, I stepped back. I know of folks having great success with Go, and it offers a lot that I want (native code, UNIX friendly, higher level then C, lower level then Python or Ruby, garbage collected, strongly typed, good performance, good concurrency support, etc), so I tried to stop programming my way, and start programming Go’s way.

Go has a way of programming. Go is totally optimized for that way of programming, in fact. Programming any way other than Go’s way, with Go, will be that recipe for frustration I bounced my skull against. Go’s way is not pretty to someone indoctrinated with the modern functional aesthetic, but it works, and works well. Really well.

Go’s inconsistencies and limitations hold together, bizarrely enough. They steer code towards particular structures and behavior that are good. Based on my limited experience (I am still a Go novice, I have been using it in anger for only about three weeks), Go seems to be as much, or more, about how it is used as how the language itself works. It seems to be optimized for solving design issues a particular way, and for solving issues around programming (again, a particular way), rather then for being a maximally expressive or powerful language.

This, of course, should not have been a surprise to me. Every presentation, description of purpose, etc, about Go says this. I had read them and said, “that makes sense, sure.” I still went into it looking at the language and wanting to use the language to solve the problems I had in the way I conceptualized them. That failed. When I adopted Go’s way of working (as I slowly started to see it) things succeeded. I also relearned some fundamental things I already knew but had apparently forgotten.

I look forward to using Go more.


Private Apt Repos in S3

Setting up a private apt repository in S3 is actually not too bad. This HOWTO sets up two minimal repositories, one public and one private. You need both. All work is to be done on a Debian or Ubuntu machine with an architecture matching what you ar edeploying to (ie, amd64).

Kyle Shank did the heavy lifting for us by making a s3:// transport scheme for apt. Sadly, that package isn’t in any reliable public repos I know of, so to be safe this HOWTO will have you host it in a repo you control.

The process is therefore two steps, setting up the public repo to hold the s3 apt handler, installing that handler, then setting up a private repo which uses authenticated s3 connections to access your debs.

The first repo is a public repo which exists to hold the apt-transport-s3 package. Check out a fork of apt-s3 and build it using make deb. You probably will want to nest the checkout in a dedicated directory as the build drops the deb one directory up from the one you build from. Go figure. It requires libapt-pkg-dev libcurl4-openssl-dev be installed, see the README for details, it is pretty good.

Once you have that built, you’ll need to put it into our public repo. Doing this looks like:

$ mkdir repos
$ cd repos
$ mkdir -p public-repo/binary
$ cp ~/src/borkage/apt-transport-s3_1.1.1ubuntu2_amd64.deb pubic-repo/binary
$ cd public-repo
$ dpkg-scanpackages binary /dev/null | gzip -9c > binary/Packages.gz
$ dpkg-scansources binary /dev/null | gzip -9c > binary/Sources.gz

We made a repos directory which will hold both of our repos, then made a public-repo/binary directory for our binary artifacts. We copy in our apt-transport-s3 deb and build both package and source indexes. Be sure not to add a trailing slash to the binary bit in the dpkg-* incantations, it will throw off locations in the index. We build a Sources.gz, which will be empty, so that using add-repository doesn’t freak out.

We now have a local copy of our repo, yea! We want to push this up to s3, so make yourself a bucket for the public repo, I’ll call the demo one demo-public-repo. We’re going to sync this up using s3cmd. You should install it and configure it:

$ sudo apt-get install s3cmd
$ s3cmd --configure

Follow the instructions when configuring.

Now we’ll use s3cmd to sync our repo up:

$ cd repos
$ s3cmd sync -P public-repo/ s3://demo-public-repo

Note the -P – we need the artifact herein to be public so that we can install it.

Okay, the private repo will be just like the public repo, except we’ll use a different bucket and not make it world readable:

$ cd repos
$ mkdir -p private-repo/binary
$ cp ~/src/secret-stuff/target/secret_0.0.1.deb private-repo/binary
$ cd private-repo
$ dpkg-scanpackages binary /dev/null | gzip -9c > binary/Packages.gz
$ dpkg-scansources binary /dev/null | gzip -9c > binary/Sources.gz
$ cd ..
$ s3cmd sync -P private-repo/ s3://demo-private-repo

Now log into the host which needs to use the private repo and add the following line to /etc/apt/sources.list

deb http://s3-us-west-2.amazonaws.com/demo-public-repo binary/

Of course, you’re URL will vary – find the HTTP URL for your bucket root and use that. This one happens to be in us-west-2 (Oregon), yours will most likely not be.

Once it is added, install apt-transport-s3 via:

$ sudo apt-get update
$ sudo apt-get install -y --force-yes apt-transport-s3

We need the --force-yes as we didn’t sign the deb.

Now, the magic, this allows us to add a repo url of the form:

deb s3://<access-key>:[<secret-key>]@s3-us-west-2.amazonaws.com/demo-private-repo binary/

to /etc/apt/sources.list where you, again, get the right region and bucket information, and replace <access-key> and <secret-key> with your actual aws access and secret keys. The brackets above are not indicating that the secrrt key is optional, you need to include the brackets. They are there to disambiguate the secret key if any characters show up that confuse things.

You can now update your apt indices again and then go install things from your private repo!


Rethinking Some Galaxy Core Assumptions

Galaxy has been very successful, in multiple companies, but I think it can actually be simplified and made more powerful. The first touches on the heart of Galaxy, the second on the often-argued configuration aspect.

RC Scripts

I described the heart of Galaxy as a tarball with an rc script. I think it likely that the rc script should give way to a simple run script. The difference is that the run script doesn’t daemonize – it simply keeps running. An implementation will probably need to have a controller process which does daemonize (or defer to upstart or its ilk for process management).

While writing an rc script is far from rocket surgery, it turns out that the nuances are annoying to implement again, and again, and again. The main nuance is daemonizing things correctly. I’d prefer to provide that for software rather then force applications to get it right. Many app servers handle daemonization well, but they also all (that I know of, anyway) provide a mechanism for running in the foreground as well.

Unfortunately, a run script model makes a no-install implementation much trickier. The lore on daemonizing from bash is tricky, but even assuming bash is tricky. Using something like daemonize is nice, but then it requires an installation. Grrr. This is an implemenation problem though, and requiring some kind of installation on the appserver side may be worth it for simplifying the model.

Configuration

In a moment of blinding DUH, I came back to environment variables for environmental information. I mean, it works for everyone on Heroku or Cloud Foundry.

There has been a trend in Galaxy implementations (and elsewhere) to use purely file based configuration. This is great for application configuration, but is meh for environmental configuration. This has lead to most Galaxy implementation supporting some model of URL to Path mapping, for placing configuration files into deployed applications. These mechanisms are a great way to provide escape hatch/override configuration but plays against the goal of making deployments self contained, which I like. This punts on going all the way, which Dan likes to advocate, to putting environment information into the deploy bundle, but I am not sold on this myself :-)

Regardless, a general purpose implementation probably needs to support both env var configuration and file based, but you can certainly recommend one way of making use of it.


What is Galaxy?

At Ning, in 2006, Martin and I wrote a deployment tool called Galaxy. Since that time I know of at least three complete reimplementations, two major forks, and half a dozen more partial reimplementations. In a bizarre twist of fate, I learned yesterday from Adrian that my friend James also has a clean room implementation. Using Fabric called Process Manager. Holy shit.

Beyond reimplementations and forks from ex-Ninglets who are using a Galaxy derivitive, I frequently hear from ex-Ninglets who are not and wish they could. We clearly got something right, it seems. Fascinatingly, folks all seem to focus on different aspects of Galaxy in terms of what they love about it. They also tend to have a common set of complaints about how Ning’s version worked, and have adapted theirs to accomodate them.

To me, the heart of Galaxy is the concept of the galaxy bundle, a tarball with the application and its dependencies coupled with an RC script at a known location inside the bundle. Given such a bundle, a Galaxy implementation is then the tooling for deploying and managing those bundles across a set of servers. From personal, and second hand, experience this simple setup can keep things happy well into the thousands of servers.

To many others, the heart of Galaxy seems to be the tooling itself, and the fairly nice way of managing applications seperately from servers. At least one major user even ignores the idea of putting the applications and their dependencies in the bundle, and uses Galaxy to install RPMs! (I personally think this approach is not so great, but the person doing it is one of the best engineers I know, so am happy to believe I may be wrong).

Different folks have also drawn the line of what the Galaxy implementation should manage in quite different places. In the orginal implementation, Galaxy included bundle and configuration repositories, along with how those repos were structured, an agent to control the application on the server, a console to keep track of it all, and a command line tool to query and take actions on the system. On the other hand, the Proofpoint/Airlift implementation weakens the contracts on configuration (in a good way), requires a Maven repository for bundles, supports an arbitrary number of applications per host, and has Galaxy handle server provisioning as well as application deployment. The Ness and, I believe, Metamarkets, implementation changes the configuration contract significantly, also supports several applications per host, and includes much more local server state in what Galaxy itself manages.

The other (generally minor) implementations and experiments have taken it in quite a few different directions, ranging from Pierre’s reimplementation using Erlang and CouchDB, to my reimplementation with no agents or console.

There seems to be an awful lot of experimentation around the concepts in Galaxy, which is awesome! Unfortunately, only the original implementation is very well documented at this point, so it is tough to use Galaxy unless you have used it before (hence my shock at James even knowing about it). I guess it’s time to start documenting and try to save other folks some work!