Friday, 27 July 2012

Disruptor v3 - Faster Hopefully

I've be working sporadically on the next major revision of the Disruptor, but still making steady progress. I've merged my experimental branch into the main line and I'm working on ensure comprehensive test coverage and re-implement the Disruptor DSL.

As a matter of course I've been running performance tests to ensure that we don't regress performance. While I've not been focusing on performance, just some refactoring and simplification I got a nice surprise. The new version is over twice as fast for the 1P1C simple test case; approximately 85M ops/sec versus 35M ops/sec. This is on my workstation which is an Intel(R) Xeon(R) CPU E5620@2.40GHz.

Current released version (2.10.1)

[barkerm@snake disruptor-2]$ taskset -c 4-8,12-15 ant througput:single-test
througput:single-test:
    [junit] Running
com.lmax.disruptor.OnePublisherToOneProcessorUniCastThroughputTest
    [junit] Started Disruptor run 0
    [junit] Disruptor=21,141,649 ops/sec
    [junit] Started Disruptor run 1
    [junit] Disruptor=20,597,322 ops/sec
    [junit] Started Disruptor run 2
    [junit] Disruptor=33,233,632 ops/sec
    [junit] Started Disruptor run 3
    [junit] Disruptor=32,883,919 ops/sec
    [junit] Started Disruptor run 4
    [junit] Disruptor=33,852,403 ops/sec
    [junit] Started Disruptor run 5
    [junit] Disruptor=32,819,166 ops/sec
    [junit] Started Disruptor run 6

Current trunk.

[barkerm@snake disruptor]$ taskset -c 4-8,12-15 ant througput:single-test
througput:single-test:
    [junit] Running
com.lmax.disruptor.OnePublisherToOneProcessorUniCastThroughputTest
    [junit] Started Disruptor run 0
    [junit] Disruptor=23,288,309 ops/sec
    [junit] Started Disruptor run 1
    [junit] Disruptor=23,573,785 ops/sec
    [junit] Started Disruptor run 2
    [junit] Disruptor=86,805,555 ops/sec
    [junit] Started Disruptor run 3
    [junit] Disruptor=87,183,958 ops/sec
    [junit] Started Disruptor run 4
    [junit] Disruptor=86,956,521 ops/sec
    [junit] Started Disruptor run 5
    [junit] Disruptor=87,260,034 ops/sec
    [junit] Started Disruptor run 6
    [junit] Disruptor=88,261,253 ops/sec
    [junit] Started Disruptor run 7

There is still more to come. In addition to improvement for the single producer use case the next version will include a new multiple producer algorithm that will replace the two that we currently have and work better than both of them in all scenarios.

5 comments:

Jamie Allen said...

Awesome. Nice work, Mike!

Zahid Qureshi said...

How have you improved the performance?

Michael Barker said...

Nothing specific, mostly code clean up, refactoring and simplification, the performance boost was a surprise.

Anonymous said...

Just to confirm:

1) you are measuring the time it takes for all messages to make it to the consumer as opposed to the time it takes for the producer to get rid of all the messages, correct?

2) you are using hyper-threading, in other words, producer and consumer are pinned to the same core which makes their communication much faster through the L1 cache, correct?

Michael Barker said...

Hi Chris,

1) Yes.

2) No, I've set the taskset to allow a range of CPUs. Primarily to avoid running on the core that is handling OS interrupts. The actual cores that they run on is set up by the OS.