Embedding Clojure

There is some information out there on embedding Clojure in Java, but it isn’t the easiest to find, and the examples don’t tend to come with explanations, so… here is yet another!

Let’s take a silly example and say we want to embed clojure as a validation language on something, so that it looks something like this

public class Thing
{
    private int num = 0;
    
    @Validate("(> num 0)")
    public void setNum(@Name("num") Integer num) {
        this.num = num;
    }
    
    
    @Validate("(< first second)")
    public void setInOrder(@Name("first") Integer first,
                           @Name("second") Integer second) {
        this.num = first + second;
    }
}

We want the validation function, expressed in the @Validate annotation to be invoked on every call to the method, binding the appropriate parameters to their @Name, etc. That is, for the second one, we want ensure that first is less than second, and so forth. We want it to be really fast – the validation will be called on every invocation of the validated method, so we need it to be really fast.

While fairly contrived, and rather absurd, it makes a nice example :-)

What we’d like to do is hold a reference to an otherwise anonymous clojure function (we don’t want to pollute the global namespace) and invoke it on every method call with some kind of method interceptor.

We can create the Clojure function reference with something like:

public IFn define(String func) throws Exception {
  String formish = String.format("(fn [val] (true? %s))", func);
  return (IFn) clojure.lang.Compiler.load(new StringReader(formish);
}

/* ... */

IFn fn = define("(> val 0)");

assertTrue((Boolean) fn.invoke(7));

The clojure compiler (inconeniently in Java 6) is named Compiler and provides a handy load(String) function which will read and evaluate a String, returning whatever it evaluates to. In this case we return a function which wraps our validation function in a test for true-ishness. In this example, our passed in value has a hard coded name, val, which is unfortunate, but can be worked around.

We can invoke this function directly via one of its’ invoke methods – it has a ton of overloads for different argument counts.

This approach will generate a Java class (well, a .class anyway) implementing our function.

To wrap behavior of a class, rather than an interface, and in a performant way, we’ll break out the ever-scary-but-awesome CGLIB and create a runtime extension of the class being validated. CGLIB is fast, but you pay for that with some gnarly low-level-feeling hackey. Not as low as ASM, though :-)

Our object factory looks like

public <T> T build(Class<T> type) throws Exception {
    Enhancer e = new Enhancer();
    e.setSuperclass(type);
    List<Callback> callbacks = new ArrayList<Callback>();
    callbacks.add(NoOp.INSTANCE);
    final Map<String, Integer> callback_mapping = new HashMap<String, Integer>();
    int count = 0;
    for (Method method : type.getDeclaredMethods()) {
        if (method.isAnnotationPresent(Validate.class)) {
            callbacks.add(new Handler(method));
            callback_mapping.put(method.getName(), ++count);
        }
        else {
            callback_mapping.put(method.getName(), 0);
        }
    }
    e.setCallbacks(callbacks.toArray(new Callback[callbacks.size()]));
    e.setCallbackFilter(new CallbackFilter()
    {
        public int accept(Method method) {
            Integer i = callback_mapping.get(method.getName());
            if (i == null) {
                return 0;
            }
            else {
                return i;
            }
        }
    });
    return (T) e.create();
}

Which is pretty gnarly. Basically, for methods without the @Validate annotation, it provides a NOOP, passing through to the parent class, for methods with the annotation, it delegates to a special Callback. We do some calback filter hackery to allow it to avoid dynamically dispatching at runtime (like a reflection proxy would), allowing CGLIB to generate a method body that invokes out handler directly. None of this is really what we are interested in, the Handler has the good stuff, let’s see it.

private static class Handler implements MethodInterceptor
{
    private final IFn fn;
    private int[] boundParamOffsets;
    private final Method method;

    private Handler(Method m) throws Exception {
        final Validate v = m.getAnnotation(Validate.class);
        Annotation[][] panno = m.getParameterAnnotations();
        ArrayList<String> names = new ArrayList<String>();
        List<Integer> counts = new ArrayList<Integer>();
        for (int i = 0; i < panno.length; i++) {
            Annotation[] param_annotations = panno[i];
            for (Annotation a : param_annotations) {
                if (a instanceof Name) {
                    names.add(((Name) a).value());
                    counts.add(i);
                }
            }
        }
        final StringBuilder args = new StringBuilder();
        for (String name : names) {
            if (name != null) {
                args.append(name).append(" ");
            }
        }
        String form = format("(fn [%s] (false? %s))",
                                                args.toString(),
                                                v.value());
        fn = (IFn) load(new StringReader(form));
        this.boundParamOffsets = new int[counts.size()];
        Class[] arglets = new Class[this.boundParamOffsets.length];
        for (int i = 0; i < this.boundParamOffsets.length; i++) {
            arglets[i] = Object.class;
            this.boundParamOffsets[i] = counts.get(i);
        }
        method = fn.getClass().getDeclaredMethod("invoke", arglets);
    }

    public Object intercept(Object o, 
                            Method method, 
                            Object[] objects, 
                            MethodProxy proxy) throws Throwable {
        Object[] args = new Object[boundParamOffsets.length];
        for (int i = 0; i < boundParamOffsets.length; i++) {
            args[i] = objects[boundParamOffsets[i]];
        }
        final Boolean bad = (Boolean) this.method.invoke(fn, args);
        if (bad) {
            throw new IllegalArgumentException("Failed validation!");
        }
        return proxy.invokeSuper(o, objects);
    }
}

YIKES! Okay, this is where it gets ugly, though it is a darned nice example of why macros are handy… if your language supports em. Java doesn’t, so it’s ugly.

We’ll step through it, though. The constructor wants to build three things, the Clojure function (fn), an array of offsets into the parameter list for the named parameters (boundParamOffsets), and finally grab the Java Method for the correct invoke on the Clojure function so we can invoke it via reflection (okay, we could optimize one step further and create concrete invoker classes which do it without reflection, but it’s getting late) imaginatively named method.

The first chunk of the constructor finds the relevant annotations and builds up a list of their names, in the order they appear, as well as the offsets, in that same order. Luckily, we are going to define a wrapper function which binds them, so we can control the order.

    final Validate v = m.getAnnotation(Validate.class);

    Annotation[][] panno = m.getParameterAnnotations();
    ArrayList<String> names = new ArrayList<String>();
    List<Integer> counts = new ArrayList<Integer>();
    for (int i = 0; i < panno.length; i++) {
        Annotation[] param_annotations = panno[i];
        for (Annotation a : param_annotations) {
            if (a instanceof Name) {
                names.add(((Name) a).value());
                counts.add(i);
            }
        }
    }
    final StringBuilder args = new StringBuilder();
    for (String name : names) {
        if (name != null) {
            args.append(name).append(" ");
        }
    }

At the end, we build up a list of the names in the form they’ll be embedded into our function template

    String form = format("(fn [%s] (false? %s))",
                                            args.toString(),
                                            v.value());
    fn = (IFn) load(new StringReader(form));

This looks a lot like the earlier example of function definition, the main difference being that we are templating in the names of the arguments now, so that they match what is expected.

The last bit of the constructor,

    this.boundParamOffsets = new int[counts.size()];
    Class[] arglets = new Class[this.boundParamOffsets.length];
    for (int i = 0; i < this.boundParamOffsets.length; i++) {
        arglets[i] = Object.class;
        this.boundParamOffsets[i] = counts.get(i);
    }
    method = fn.getClass().getDeclaredMethod("invoke", arglets);

just stores off the bound parameter offsets and grabs a handle on the Method, straightforward… compared to the rest, anyway.

The rest of the fun stuff happens on method invocation:

    public Object intercept(Object o, 
                            Method method, 
                            Object[] objects, 
                            MethodProxy proxy) throws Throwable {
        Object[] args = new Object[boundParamOffsets.length];
        for (int i = 0; i < boundParamOffsets.length; i++) {
            args[i] = objects[boundParamOffsets[i]];
        }
        final Boolean bad = (Boolean) this.method.invoke(fn, args);
        if (bad) {
            throw new IllegalArgumentException("Failed validation!");
        }
        return proxy.invokeSuper(o, objects);
    }

In this case, we build up the array of arguments to pass to the clojure function, invoke it, and if it returns true, we raise an exception. Otherwise, we pass the invocation on to the parent class.

Whoo! It’s a fair number of hoops to jump through, but the performance requirement is to blame for a lot of them. The code can probably be simplified a lot, but… it works, and is damned fast. In testing it, the Clojure IFn invocation was nearly indistinguishable from a pure-Java Callable invocation in the informal microbenchmarks I ran. Not really surprising considering it is generating a class to implement the function… Adding a type hint on the arguments actually lead to the Clojure one being frequently faster… somehow, I am not sure how. Need to pull out javap to see what is being generated, but I digress :-)

Some key takeaways from this exercise, for me, were a mind shift in how I think about Clojure and Java interop. I am used to working with languages which embed and have their own runtime, like Lua. Clojure doesn’t get embedded, it just coexists. It doesn’t have its’ own runtime – you invoke functions and manipulate state (not that the final form we see here does that) directly via APIs which look like Java reflection or references, respectively. I really like it.

To wrap up, here is the test case for the whole thing:

package org.skife.example;

import static org.testng.Assert.assertEquals;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;

public class TestValidatemajig
{
    private Thing t;

    @BeforeMethod
    public void setUp() throws Exception {
        t = new Validatemajig().build(Thing.class);
    }

    @Test(expectedExceptions = IllegalArgumentException.class)
    public void testValidationFailure() throws Exception {
        t.setNum(-1);
    }

    @Test
    public void testValidationSuccess() throws Exception {
        t.setNum(1);
        assertEquals(1, t.num);
    }

    @Test(expectedExceptions = IllegalArgumentException.class)
    public void testMultipleParamFailure() throws Exception {
        t.setInOrder(2, 1);
    }

    @Test
    public void testMultipleParamSuccess() throws Exception {
        t.setInOrder(1, 2);
        assertEquals(3, t.num);
    }

    public static class Thing
    {
        private int num = 0;

        @Validate("(> num 0)")
        public void setNum(@Name("num") Integer num) {
            this.num = num;
        }

        @Validate("(< first second)")
        public void setInOrder(@Name("first") Integer first,
                               @Name("second") Integer second) {
            this.num = first + second;
        }
    }
}

Hopefully more Clojure fun to come, though after wrestling with doing Clojure from Java, I may just switch to Clojure and do some Java from that angle.