Wednesday, 20 July 2011

Now Offically an OpenJDK Contributor

Being someone who's interested in Open Source and very focused on performance, I've been looking at the Java & the JVM and wondering about what I would most like to see added.  IMHO one of the fundamental missing pieces is support for complex stack allocated types.  This is not a new idea.  C# calls them structs. In C, C++ and Go stack & heap allocation is orthogonal to the definition of the type.

An implementation for Java is described by John Rose (some time ago) on his blog.  While this is presented as a feature that would benefit alternative languages on the JVM, I think this could bring something significant to Java as well (not that I actually know what such a language feature would look like, I'm not a language designer).  It would improve the efficiency of a some very useful design patterns that tend to be avoided (by me anyway) due to the cost of heap allocation.  E.g. a number of classes in LMAX's code base consist of aggregations of 64 bit integers.

public class OrderInstruction {
    public OrderInstruction(long accountId,
                            long instrumentId, 
                            long clientOrderId,
                            long stopPrice,
                            long price,
                            long quantity) {
        // ...
    }
}

While very efficient, it can lead to a number of errors as it is very easy to screw up an assignment wouldn't prevent assigning the accountId to the price.  However, with support for tuples/value types/structs then it would be possible to do the following (I've borrowed from the C# syntax):

public struct AccountId {
    public AccountId(long id) {
        // ...
    }
}

public struct InstrumentId {
    public InstrumentId(long id) {
        // ...
    }
}

public class OrderInstruction {
    public OrderInstruction(AccountId accountId,
                            InstrumentId instrumentId, 
                            ClientOrderId clientOrderId,
                            Price stopPrice,
                            Price price,
                            Quantity quantity) {
        // ...
    }
}

Given any reasonable implementation the structs Account, InstrumentId, etc, would compile down to a long (and be no less efficient).  It would enforce type safety preventing accidental incorrect assignment.  Another benefit is that is allows seamless extension of the types, with far less impact.  E.g. if we need to extend the AccountId to include a realm field, no change to the OrderInstruction would be required.

public struct AccountId {
    public AccountId(int realm, long id) {
        // ...
    }
}

This pattern is known as Tiny Types.  Sometimes it's the small things that matter.

The other nice feature would support for more complex and suitably efficient numeric types, e.g:

public struct Rational {
    public Rational(long numerator, long denominator) {
        // ....
    }
}

// This would a favourite at LMAX
public struct Decimal {
    public Decimal(long value, int exponent) {
        // ...
    }
}

We could even steal some features from Google Go:

public struct Slice {
    public Slice(byte[] array, int offset, int length) {
        // ....
    }
}

Pairs, Tuples and Option types could also be implemented efficiently too.  Instances of these types would be passed and assigned by copy.  The memory for these types would be either allocated on the stack or inline within a object when laid out on the heap. This could have (I hope) a significant improvement in spatial cache locality and reduce the pressure on the garbage collector.

I think this feature is so important that I have started implementing it myself as part of the MLVM project.  My first patches have now been accepted into MLVM.  I'm not sure if or when this will end up in the JDK proper or if the Java language will support it, but it has to start somewhere.