RPC Over SSH and Domain Sockets

I really like using SSH for authentication and authorization when possible – it is very configurable, well understood, and more secure then anything I am likely to design. It is also generally pretty easy to have applications communicate over SSH. A nice model is to have the server listen on a domain socket in a directory with appropriate permissions, and clients connect over ssh and netcat to talk to it.

Logically, on the client it is:

$ ssh server.example.com /usr/bin/nc -U /tmp/foo

And voila, your client (or shell in this case) is connected to the remote domain socket. After finding Jeff Hodges’s wonderful writeup on go.crypto/ssh I sat down to make Go do this internally. It was fun, and pretty straightforward.

The server is just a net/rpc server which listens on a domain socket and responds with a greeting:

package main

import (
	"fmt"
	"log"
	"net"
	"net/rpc"
	"os"
	"os/signal"
	"syscall"
)

// rpc response
type Response struct {
	Greeting string
}

// rpc request
type Request struct {
	Name string
}

// rpc host struct thing
type Greeter struct{}

// our remotely invocable function
func (g *Greeter) Greet(req Request, res *Response) (err error) {
	res.Greeting = fmt.Sprintf("Hello %s", req.Name)
	return
}

// start up rpc listener at path
func ServeAt(path string) (err error) {
	rpc.Register(&Greeter{})

	listener, err := net.Listen("unix", path)
	if err != nil {
		return fmt.Errorf("unable to listen at %s: %s", path, err)
	}

	go rpc.Accept(listener)
	return
}

// ./server /tmp/foo
func main() {
	path := os.Args[1]

	err := ServeAt(path)
	if err != nil {
		log.Fatalf("failed: %s", err)
	}
	defer os.Remove(path)

	// block until we are signalled to quit
	wait()
}

func wait() {
	signals := make(chan os.Signal)
	signal.Notify(signals, syscall.SIGINT, syscall.SIGKILL, syscall.SIGHUP)
	<-signals
}

The client is the fun part. It establishes an SSH connection to the server host, then fires off a Session against netcat, attaches an RPC client to that session, and does its stuff!

package main

import (
	"code.google.com/p/go.crypto/ssh"
	"fmt"
	"io"
	"log"
	"net"
	"net/rpc"
	"os"
	"strings"
)

// RPC response container
type Response struct {
	Greeting string
}

// RPC request container
type Request struct {
	Name string
}

// It would be nice if ssh.Session was an io.ReaderWriter
// proposal submitted :-)
type NetCatSession struct {
	*ssh.Session // define Close()
	writer io.Writer
	reader io.Reader
}

// io.Reader
func (s NetCatSession) Read(p []byte) (n int, err error) {
	return s.reader.Read(p)
}

// io.Writer
func (s NetCatSession) Write(p []byte) (n int, err error) {
	return s.writer.Write(p)
}

// given the established ssh connection, start a session against netcat and
// return a io.ReaderWriterCloser appropriate for rpc.NewClient(...)
func StartNetCat(client *ssh.ClientConn, path string) (rwc *NetCatSession, err error) {
	session, err := client.NewSession()
	if err != nil {
		return
	}

	cmd := fmt.Sprintf("/usr/bin/nc -U %s", path)
	in, err := session.StdinPipe()
	if err != nil {
		return nil, fmt.Errorf("unable to get stdin: %s", err)
	}

	out, err := session.StdoutPipe()
	if err != nil {
		return nil, fmt.Errorf("unable to get stdout: %s", err)
	}

	err = session.Start(cmd)
	if err != nil {
		return nil, fmt.Errorf("unable to start '%s': %s", cmd, err)
	}

	return &NetCatSession{session, in, out}, nil
}


// ./client localhost:/tmp/foo Brian
func main() {
	parts := strings.Split(os.Args[1], ":")
	host := parts[0]
	path := parts[1]
	name := os.Args[2]


	// SSH setup, we assume current username and use the ssh agent
	// for auth
	agent_sock, err := net.Dial("unix", os.Getenv("SSH_AUTH_SOCK"))
	if err != nil {
		log.Fatalf("sorry, this example requires the ssh agent: %s", err)
	}
	defer agent_sock.Close()

	config := &ssh.ClientConfig{
		User: os.Getenv("USER"),
		Auth: []ssh.ClientAuth{
			ssh.ClientAuthAgent(ssh.NewAgentClient(agent_sock)),
		},
	}
	ssh_client, err := ssh.Dial("tcp", fmt.Sprintf("%s:22", host), config)
	if err != nil {
		log.Fatalf("Failed to dial: %s", err)
	}
	defer ssh_client.Close()


	// Establish sesstion to netcat talking to the domain socket
	s, err := StartNetCat(ssh_client, path)
	if err != nil {
		log.Fatalf("unable to start netcat session: %s", err)
	}


	// now comes the RPC!
	client := rpc.NewClient(s)
	defer client.Close()

	req := &Request{name}
	var res Response

	err = client.Call("Greeter.Greet", req, &res)
	if err != nil {
		log.Fatalf("error in rpc: %s", err)
	}
	fmt.Println(res.Greeting)
}

And there it is! This isn’t exactly library code, but it nicely bundles up how to do it.

I really like using domain sockets and SSH for “operational” stuff. The slight overhead of firing up extra processes on the server, and hopping between tcp and unix sockets doesn’t usually matter, and you get lots of nice well understood and configurable security for your sessions.

In this case, I’m using SSH-as-a-library, in the past I have shelled out to SSH in order to take advantage of client side SSH configuration as well. Which makes the most sense varies, of course :-)

$ ./server /tmp/foo &
[1] 46206
$ ./client localhost:/tmp/foo "brave SSH world"
Hello brave SSH world
$ 

I put the project into a gist you can clone and noodle with if you like :-)


Some JDBI 3 Noodling

I grabbed the most recent build of jdk8 w/ lambdas tonight and started noodling on jdbi 3, which will require Java 8.

Set<Something> things = jdbi.withHandle(h -> {
    h.execute("insert into something (id, name) values (?, ?)", 1, "Brian");
    h.execute("insert into something (id, name) values (?, ?)", 2, "Steven");

    return h.query("select id, name from something")
            .map(rs -> new Something(rs.getInt(1), rs.getString(2)))
            .into(new HashSet<Something>());
});

assertThat(things).isEqualTo(ImmutableSet.of(new Something(1, "Brian"),
                                             new Something(2, "Steven")));

The Stream interface is kind of heavy to implement as it stands right now, and I couldn’t get IDEA 12 and the JDK to agree on valid syntax. Neither one wants to let me omit the <Something> in the .into(new HashSet<Something>()); line, which the most recent State of the Collections implies I should be able to.

It would be really nice if the lambda syntax sugar would quietly drop return values when it is auto-converting to a Block without the { ... } – I had to make some things accept a Function rather then a Block even though I ignore the return value, this will then bite you when you use something that doesn’t have a return value. Java has side effects, sometimes we call a function which returns a value just for the side effects.

All told, I like the changes so far quite a bit despite my quibbles :-)


Go is PHP for the Backend

I’ve had the opportunity to use Go for my most recent project at work. The stuff I’ve done in Go is a minimally distributed system (two types of servers, tens of instances max) optimized for byte slinging throughput. The relationship with Go started out a bit rocky but got turned around.

After using it for a couple weeks, I described Go to my friend David as “PHP for the backend.” Despite my pretty low opinion of PHP, this was intended as a compliment. Regardless of the quality of the execution of PHP, the intent seems to have been to get out of your way and make building web pages easy. Go feels like that but for services. PHP is horribly inconsistent, breaks all the rules about programming language design, and is infuriating. Despite all that, it’s still the most widely used language for building web apps.

Go is rather similar – it is inconsistent, ignores anything a modern programming language is supposed to include, doesn’t use whitespace, except to disallow it in reasonable places (say, as a newline before a {). It offers nice first class functions, but then cripples them by having a strong type system which seems to ignore everything that has been done with type systems for the last couple decades. You cannot even write a proper map(...) because Go is strongly typed with no type parameterization. Go really wants to use tabs.

To top it off, errors are return values. They are return values which are easy to ignore. Idiommatic Go is to have several lines of boilerplate after every single function invocation which can possibly fail.

I got really annoyed, flamed Go on Twitter, and went for a walk. When I came back, several friends, in particular Toby had commented in IM about my issues, pointing out ways of trying to handle what I was being annoyed by. They were all very reasonable, but basically came down to something along the lines of, “Go doesn’t do what you are trying to do; there are some brutal hacks to approximate it, like how you do functional-ish programming in Java, but your are fighting the system.”

Calmed down, I stepped back. I know of folks having great success with Go, and it offers a lot that I want (native code, UNIX friendly, higher level then C, lower level then Python or Ruby, garbage collected, strongly typed, good performance, good concurrency support, etc), so I tried to stop programming my way, and start programming Go’s way.

Go has a way of programming. Go is totally optimized for that way of programming, in fact. Programming any way other than Go’s way, with Go, will be that recipe for frustration I bounced my skull against. Go’s way is not pretty to someone indoctrinated with the modern functional aesthetic, but it works, and works well. Really well.

Go’s inconsistencies and limitations hold together, bizarrely enough. They steer code towards particular structures and behavior that are good. Based on my limited experience (I am still a Go novice, I have been using it in anger for only about three weeks), Go seems to be as much, or more, about how it is used as how the language itself works. It seems to be optimized for solving design issues a particular way, and for solving issues around programming (again, a particular way), rather then for being a maximally expressive or powerful language.

This, of course, should not have been a surprise to me. Every presentation, description of purpose, etc, about Go says this. I had read them and said, “that makes sense, sure.” I still went into it looking at the language and wanting to use the language to solve the problems I had in the way I conceptualized them. That failed. When I adopted Go’s way of working (as I slowly started to see it) things succeeded. I also relearned some fundamental things I already knew but had apparently forgotten.

I look forward to using Go more.


Private Apt Repos in S3

Setting up a private apt repository in S3 is actually not too bad. This HOWTO sets up two minimal repositories, one public and one private. You need both. All work is to be done on a Debian or Ubuntu machine with an architecture matching what you ar edeploying to (ie, amd64).

Kyle Shank did the heavy lifting for us by making a s3:// transport scheme for apt. Sadly, that package isn’t in any reliable public repos I know of, so to be safe this HOWTO will have you host it in a repo you control.

The process is therefore two steps, setting up the public repo to hold the s3 apt handler, installing that handler, then setting up a private repo which uses authenticated s3 connections to access your debs.

The first repo is a public repo which exists to hold the apt-transport-s3 package. Check out a fork of apt-s3 and build it using make deb. You probably will want to nest the checkout in a dedicated directory as the build drops the deb one directory up from the one you build from. Go figure. It requires libapt-pkg-dev libcurl4-openssl-dev be installed, see the README for details, it is pretty good.

Once you have that built, you’ll need to put it into our public repo. Doing this looks like:

$ mkdir repos
$ cd repos
$ mkdir -p public-repo/binary
$ cp ~/src/borkage/apt-transport-s3_1.1.1ubuntu2_amd64.deb pubic-repo/binary
$ cd public-repo
$ dpkg-scanpackages binary /dev/null | gzip -9c > binary/Packages.gz
$ dpkg-scansources binary /dev/null | gzip -9c > binary/Sources.gz

We made a repos directory which will hold both of our repos, then made a public-repo/binary directory for our binary artifacts. We copy in our apt-transport-s3 deb and build both package and source indexes. Be sure not to add a trailing slash to the binary bit in the dpkg-* incantations, it will throw off locations in the index. We build a Sources.gz, which will be empty, so that using add-repository doesn’t freak out.

We now have a local copy of our repo, yea! We want to push this up to s3, so make yourself a bucket for the public repo, I’ll call the demo one demo-public-repo. We’re going to sync this up using s3cmd. You should install it and configure it:

$ sudo apt-get install s3cmd
$ s3cmd --configure

Follow the instructions when configuring.

Now we’ll use s3cmd to sync our repo up:

$ cd repos
$ s3cmd sync -P public-repo/ s3://demo-public-repo

Note the -P – we need the artifact herein to be public so that we can install it.

Okay, the private repo will be just like the public repo, except we’ll use a different bucket and not make it world readable:

$ cd repos
$ mkdir -p private-repo/binary
$ cp ~/src/secret-stuff/target/secret_0.0.1.deb private-repo/binary
$ cd private-repo
$ dpkg-scanpackages binary /dev/null | gzip -9c > binary/Packages.gz
$ dpkg-scansources binary /dev/null | gzip -9c > binary/Sources.gz
$ cd ..
$ s3cmd sync -P private-repo/ s3://demo-private-repo

Now log into the host which needs to use the private repo and add the following line to /etc/apt/sources.list

deb http://s3-us-west-2.amazonaws.com/demo-public-repo binary/

Of course, you’re URL will vary – find the HTTP URL for your bucket root and use that. This one happens to be in us-west-2 (Oregon), yours will most likely not be.

Once it is added, install apt-transport-s3 via:

$ sudo apt-get update
$ sudo apt-get install -y --force-yes apt-transport-s3

We need the --force-yes as we didn’t sign the deb.

Now, the magic, this allows us to add a repo url of the form:

deb s3://<access-key>:[<secret-key>]@s3-us-west-2.amazonaws.com/demo-private-repo binary/

to /etc/apt/sources.list where you, again, get the right region and bucket information, and replace <access-key> and <secret-key> with your actual aws access and secret keys. The brackets above are not indicating that the secrrt key is optional, you need to include the brackets. They are there to disambiguate the secret key if any characters show up that confuse things.

You can now update your apt indices again and then go install things from your private repo!


Rethinking Some Galaxy Core Assumptions

Galaxy has been very successful, in multiple companies, but I think it can actually be simplified and made more powerful. The first touches on the heart of Galaxy, the second on the often-argued configuration aspect.

RC Scripts

I described the heart of Galaxy as a tarball with an rc script. I think it likely that the rc script should give way to a simple run script. The difference is that the run script doesn’t daemonize – it simply keeps running. An implementation will probably need to have a controller process which does daemonize (or defer to upstart or its ilk for process management).

While writing an rc script is far from rocket surgery, it turns out that the nuances are annoying to implement again, and again, and again. The main nuance is daemonizing things correctly. I’d prefer to provide that for software rather then force applications to get it right. Many app servers handle daemonization well, but they also all (that I know of, anyway) provide a mechanism for running in the foreground as well.

Unfortunately, a run script model makes a no-install implementation much trickier. The lore on daemonizing from bash is tricky, but even assuming bash is tricky. Using something like daemonize is nice, but then it requires an installation. Grrr. This is an implemenation problem though, and requiring some kind of installation on the appserver side may be worth it for simplifying the model.

Configuration

In a moment of blinding DUH, I came back to environment variables for environmental information. I mean, it works for everyone on Heroku or Cloud Foundry.

There has been a trend in Galaxy implementations (and elsewhere) to use purely file based configuration. This is great for application configuration, but is meh for environmental configuration. This has lead to most Galaxy implementation supporting some model of URL to Path mapping, for placing configuration files into deployed applications. These mechanisms are a great way to provide escape hatch/override configuration but plays against the goal of making deployments self contained, which I like. This punts on going all the way, which Dan likes to advocate, to putting environment information into the deploy bundle, but I am not sold on this myself :-)

Regardless, a general purpose implementation probably needs to support both env var configuration and file based, but you can certainly recommend one way of making use of it.