In Clauses 21 Dec 2011

The most common feature request I get for jDBI, going back at least seven years now, is for automagic in-clause expansion, or the equivalent. Sadly, there is not correct general case solution for this. There are lots of solutions, but what is the right thing to do is very context dependent. Let’s look at some options.

Database Specific Functionality

The first option is to use database specific functionality to achieve this. The easiest of these I know of is with PostgreSQL, which has very nice sql array support.

We’ll start with how we want to use the feature – ideally we could just bind a collection, but in this case we need to know the type of things in the collection, and we should handle empty collections, so we’ll do something like:

ImmutableSet<String> rs = h.createQuery("select name from something where id = any (:ids)")
                           .map(StringMapper.FIRST)
                           .bind("ids", arrayOf(Integer.class, 2, 3))
                           .list(ImmutableSet.class);

The bind is using a helper function to create an instance of SqlArray which just captures the information being bound:

public class SqlArray<T>
{
    private final Object[] elements;
    private final Class<T> type;

    public SqlArray(Class<T> type, Collection<T> elements) {
        this.elements = Iterables.toArray(elements, Object.class);
        this.type = type;
    }

    public static <T> SqlArray<T> arrayOf(Class<T> type, T... elements) {
        return new SqlArray<T>(type, asList(elements));
    }

    public static <T> SqlArray<T> arrayOf(Class<T> type, Iterable<T> elements) {
        return new SqlArray<T>(type, elements);
    }

    public Object[] getElements()
    {
        return elements;
    }

    public Class<T> getType()
    {
        return type;
    }
}

When binding a SqlArray it will wind up invoking the Handle#bind(String, Object) method. This is fine as we will intercept the actual binding with an ArgumentFactory. The one here is a toy, specialized solely for binding arrays of integers, but it should be straightforward to generalize:

public class PostgresIntegerArrayArgumentFactory implements ArgumentFactory<SqlArray<Integer>>
{
  public boolean accepts(Class<?> expectedType, Object value, StatementContext ctx)
  { 
    return value instanceof SqlArray
           && ((SqlArray)value).getType().isAssignableFrom(Integer.class);
  }

  public Argument build(Class<?> expectedType, 
                        final SqlArray<Integer> value, 
                        StatementContext ctx)
  {
    return new Argument()
    {
      public void apply(int position, 
                        PreparedStatement statement, 
                        StatementContext ctx) throws SQLException
        {
          // in postgres no need to (and in fact cannot) free arrays
          Array ary = ctx.getConnection()
                         .createArrayOf("integer", value.getElements());
          statement.setArray(position, ary);
        }
    };
  }
}

We need to register our argument factory on the DBI, Handle, or SqlStatement for it to be used, we’ll just put it on the handle:

Handle h = dbi.open();

h.registerArgumentFactory(new PostgresIntegerArrayArgumentFactory());
h.registerContainerFactory(new ImmutableSetContainerFactory());

While we were at it we registered a ContainerFactory that knows how to build Guava’s ImmutableSet as well. That is pretty irrelevant to this example, but hey, why not ;-)

The argument factory produces an actual Argument instance used to bind arguments to the the prepared statement. PostgreSQL is the only database I know of which supports the Connection#createArrayOf call, so this is very non-portable, but it works great in PostgreSQL!

Oracle supports something similar, but you need to use Oracle specific APIs and oracle.sql.ARRAY instances. In the Oracle case you have to pre-declare the array type in the database first, and as it stores the array in the database, free it after the call.

I don’t know of anything in MySQL to provide this behavior.

SQL Templating

Much more portable, but requiring more care is rewriting the SQL to provide the right bind slots in the sql statement. If we want to use the same SQL as above, but with an in clause, it would look like select name from something where id in (:ids), to which we want to bind two values, 2 and 3. We need to somehow expand the the statement to select name from something where id in (:id_1, :id_2). We don’t usually know how many elements will be in the in clause, so… blah.

In jDBI I like to handle this by using StringTemplate statement groups to externalize my SQL. Aside from providing nice grouping, the templating makes the problem of rewriting the sql much more tractable!

We’ll start, again, with what we want the API to look like:

@ExternalizedSqlViaStringTemplate3
@RegisterContainerMapper(ImmutableSetContainerFactory.class)
public interface DAO
{
  @SqlQuery
  ImmutableSet<String> findNamesForIds(@BindIn("names") List<Integer> names);
}

Note that the @ExternalizedSqlViaStringTemplate3 annotation on the Dao type. This is included with jDBI and tells the sql object to look for sql statements in a StringTemplate Group File loaded from the classpath such that the name is <ClassName>.sql.stg in a directory matching the package of the sql object type. If the above is org.skife.v2.docs.DAO templates for the actual SQL will be looked for on the classpath at /org/skife/v2/docs/DAO.sql.stg.

The group template file, in this case, looks like:

group DAO;

findNamesForIds(names) ::= <<
  select name from something where id in (<names>)
>>

Now, looking at this template, select name from something where id in (<names>) we see something new, the <names> bit. This is stringtemplate, not argument binding. We see the parameter to the template in the template signature as well, findNamesForIds(names).

Looking at the sql object definition, we also see @BindIn("names") instead of the more familiar @Bind("names"). The @BindIn is a custom binding annotation which knows how to work with StringTemplate 3 (specifically version 3, as there is now a version 4 available as well). Let’s look at @BindIn to see how it works:

/* 
  Please excuse the use of # in place of @, pygments + 
  maruku is totally confused by the code otherwise -brianm
*/

#Retention(RetentionPolicy.RUNTIME)
#SqlStatementCustomizingAnnotation(BindIn.CustomizerFactory.class)
#BindingAnnotation(BindIn.BindingFactory.class)
public #interface BindIn
{
    String value();

    public static final class CustomizerFactory implements SqlStatementCustomizerFactory
    {
        public SqlStatementCustomizer createForMethod(Annotation annotation, Class sqlObjectType, Method method)
        {
            throw new UnsupportedOperationException("Not supported on method!");
        }

        public SqlStatementCustomizer createForType(Annotation annotation, Class sqlObjectType)
        {
            throw new UnsupportedOperationException("Not supported on type");
        }

        public SqlStatementCustomizer createForParameter(Annotation annotation,
                                                         Class sqlObjectType,
                                                         Method method,
                                                         Object arg)
        {
            Collection<?> coll = (Collection<?>) arg;
            BindIn in = (BindIn) annotation;
            final String key = in.value();
            final List<String> ids = new ArrayList<String>();
            for (int idx = 0; idx < coll.size(); idx++) {
                ids.add("__" + key + "_" + idx);
            }

            StringBuilder names = new StringBuilder();
            for (Iterator<String> i = ids.iterator(); i.hasNext();) {
                names.append(":").append(i.next());
                if (i.hasNext()) {
                    names.append(",");
                }
            }
            final String ns = names.toString();

            return new SqlStatementCustomizer()
            {
                public void apply(SQLStatement q) throws SQLException
                {
                    q.define(key, ns);
                }
            };
        }
    }

    public static class BindingFactory implements BinderFactory
    {
        public Binder build(Annotation annotation)
        {
            final BindIn in = (BindIn) annotation;
            final String key = in.value();

            return new Binder()
            {

                public void bind(SQLStatement q, Annotation bind, Object arg)
                {
                    Iterable<?> coll = (Iterable<?>) arg;
                    int idx = 0;
                    for (Object s : coll) {
                        q.bind("__" + key + "_" + idx++, s);
                    }
                }
            };
        }
    }
}

We see that @BindIn is actually both a binding annotation and a statement customizing annotation. This allows it to manipulate the statement and to control how arguments get bound.

As a statement customizing annotation,

public SqlStatementCustomizer createForParameter(Annotation annotation,
                                                 Class sqlObjectType,
                                                 Method method,
                                                 Object arg)
{
    Collection<?> coll = (Collection<?>) arg;
    BindIn in = (BindIn) annotation;
    final String key = in.value();
    final List<String> ids = new ArrayList<String>();
    for (int idx = 0; idx < coll.size(); idx++) {
        ids.add("__" + key + "_" + idx);
    }

    StringBuilder names = new StringBuilder();
    for (Iterator<String> i = ids.iterator(); i.hasNext();) {
        names.append(":").append(i.next());
        if (i.hasNext()) {
            names.append(",");
        }
    }
    final String ns = names.toString();

    return new SqlStatementCustomizer()
    {
        public void apply(SQLStatement q) throws SQLException
        {
            q.define(key, ns);
        }
    };
}

This customizer has access to the collection passed as a parameter, so it traverses it building up the expanded binding list. In this case, if it is passed the list [2, 3] it will build the string :__names_0, :__names_1 and define it on the statement context.

The StringTemplate3 statement locator uses values defined on the statement context as arguments to the template, so in this case the template will receive that string as the names argument, and the actual SQL will be select name from something where id in (:__names_0, :__names_1).

The binding annotion piece,

public static class BindingFactory implements BinderFactory
{
    public Binder build(Annotation annotation)
    {
        final BindIn in = (BindIn) annotation;
        final String key = in.value();

        return new Binder()
        {
            public void bind(SQLStatement q, Annotation bind, Object arg)
            {
                Iterable<?> coll = (Iterable<?>) arg;
                int idx = 0;
                for (Object s : coll) {
                    q.bind("__" + key + "_" + idx++, s);
                }
            }
        };
    }
}

Receives the argument being bound, which will be what was passed in, and it performs the same operation to generate the binding names, then it actually binds them.

Which To Use?

In general, if your database supports it I would suggest using the database specific mechanism which works. Sadly, the only database which supports it well, which I have used enough to have an opinion on, is PostgreSQL. If anyone knows similar techniques for other databases, please tell us in the comments.

In all other cases, templating your SQL is the only real option. This can cause problems with statement caches (each different SQL is usually a different statement (though you can fuzz this in Oracle)), you need to align your statement locator with your templating, and you need to be careful about how big your in clause literal gets, but it works pretty well in the 80% or cases.

Some Atlas Thoughts ❯❯ Home | Archives ❮❮ Library Versioning

Brian's Waste of Time About Archive Feed

In Clauses 21 Dec 2011

Database Specific Functionality

SQL Templating

Which To Use?