Wednesday, 20 July 2011

Now Offically an OpenJDK Contributor

Being someone who's interested in Open Source and very focused on performance, I've been looking at the Java & the JVM and wondering about what I would most like to see added.  IMHO one of the fundamental missing pieces is support for complex stack allocated types.  This is not a new idea.  C# calls them structs. In C, C++ and Go stack & heap allocation is orthogonal to the definition of the type.

An implementation for Java is described by John Rose (some time ago) on his blog.  While this is presented as a feature that would benefit alternative languages on the JVM, I think this could bring something significant to Java as well (not that I actually know what such a language feature would look like, I'm not a language designer).  It would improve the efficiency of a some very useful design patterns that tend to be avoided (by me anyway) due to the cost of heap allocation.  E.g. a number of classes in LMAX's code base consist of aggregations of 64 bit integers.

public class OrderInstruction {
    public OrderInstruction(long accountId,
                            long instrumentId, 
                            long clientOrderId,
                            long stopPrice,
                            long price,
                            long quantity) {
        // ...
    }
}

While very efficient, it can lead to a number of errors as it is very easy to screw up an assignment wouldn't prevent assigning the accountId to the price.  However, with support for tuples/value types/structs then it would be possible to do the following (I've borrowed from the C# syntax):

public struct AccountId {
    public AccountId(long id) {
        // ...
    }
}

public struct InstrumentId {
    public InstrumentId(long id) {
        // ...
    }
}

public class OrderInstruction {
    public OrderInstruction(AccountId accountId,
                            InstrumentId instrumentId, 
                            ClientOrderId clientOrderId,
                            Price stopPrice,
                            Price price,
                            Quantity quantity) {
        // ...
    }
}

Given any reasonable implementation the structs Account, InstrumentId, etc, would compile down to a long (and be no less efficient).  It would enforce type safety preventing accidental incorrect assignment.  Another benefit is that is allows seamless extension of the types, with far less impact.  E.g. if we need to extend the AccountId to include a realm field, no change to the OrderInstruction would be required.

public struct AccountId {
    public AccountId(int realm, long id) {
        // ...
    }
}

This pattern is known as Tiny Types.  Sometimes it's the small things that matter.

The other nice feature would support for more complex and suitably efficient numeric types, e.g:

public struct Rational {
    public Rational(long numerator, long denominator) {
        // ....
    }
}

// This would a favourite at LMAX
public struct Decimal {
    public Decimal(long value, int exponent) {
        // ...
    }
}

We could even steal some features from Google Go:

public struct Slice {
    public Slice(byte[] array, int offset, int length) {
        // ....
    }
}

Pairs, Tuples and Option types could also be implemented efficiently too.  Instances of these types would be passed and assigned by copy.  The memory for these types would be either allocated on the stack or inline within a object when laid out on the heap. This could have (I hope) a significant improvement in spatial cache locality and reduce the pressure on the garbage collector.

I think this feature is so important that I have started implementing it myself as part of the MLVM project.  My first patches have now been accepted into MLVM.  I'm not sure if or when this will end up in the JDK proper or if the Java language will support it, but it has to start somewhere.

10 comments:

Ismael Juma said...

Hi Michael,

Thanks for tackling this! As you say, it's a very useful feature for situations where maximum performance is needed. I'll be following it with interest.

Best,
Ismael

gnash said...

C# structs are not always allocated on the stack. Is actually much more complicated than that.

http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx

Michael Barker said...

Good point and the same would follow for Java. E.g. using a value type in a generic collection would required it to be boxed onto the heap.

The post provides some excellent technical detail, however I disagree with his conclusion. The notion that the implementation doesn't matter grates against the notion of Mechanical Sympathy, which is essential when building high performance software. I think I'll need to put together a separate post on that.

George Moschovitis said...

This is a great feature, I hope to see this in Java sooner, rather than later. Is there a place where we can vote for / support this?

mikera said...

Great that you are taking this on - it's been my No.1 item on my wish list for the JVM for quite some time!

Hopefully the same approach can be applied to efficiently allow multiple return values? I've wished for this many times when writing simulations or graphical code that returns 3D vectors..... having to allocate a new object or pass around a pre-allocated object to act as temporary storage is trult ugly....

George Moschovitis said...

Perhaps you should propose something like this the the Ceylon guys or other JVM language designers. Getting this into Java might take ages...

Michael Barker said...

@mikera Yes, although I have a few more steps to work through before I get there. John Rose's proposal suggests a new byte code VRETURN that would support multiple return values.

Michael Barker said...

@George There was a bit of interest from one of the JRuby guys on the MLVM list, so there is some interest from some the of the language guys.

You're right it could take a little while to push it through into Java.

M. Bana said...

Wonderful!

Can you talk about how you went about navigating around the codebase and how you figured out how everything worked?
How long did it take, BTW?

Michael Barker said...

I've still got quite a bit to do. It's taken quite a while so far mostly 'cause I'm snatching weekends and evenings to work on it. I haven't really scratched the surface of the JDK/Hotspot, but I will at some point put together a blog on my experiences.