The design I'm going to use is, instead of having a single Disruptor with a MultiProducerSequencer, I'm going to have a Disruptor per producer with a SingleProducerSequencer for each. In order to have all of the events channel into a single EventHandler I will need to implement a custom EventProcessor that has the capability to poll multiple SequenceBarrier/DataProviders. This custom MultiBufferBatchEventProcesor can be seen within the com.lmax.disruptor.support package inside of the performance tests. The key component of this class is the main loop, which can be seen here:
Central to this approach is that the event processor has 2 arrays passed into it's constructor:
- DataProviders[]; the array of ring buffers to read data from.
- SequenceBarrier[]; the barriers supplied by each of the ring buffers.
From this the event processor will construct an array of Sequences that will be used to track the processed events from each ring buffer. The event loop will iterate through the sequence barriers to determine if any of the ring buffers have a data available to be read. Any available data will passed onto the user supplied event handler.
To test this I've created the ThreeToThreeSequencedThroughputTest that start three producers and one consumer. One of the aspects of the code supplied in the InfoQ article is that each train has a capacity of three longs and one "op" is measured as the movement of one long. To make the test more comparable I used an array of three longs as the entry within the ring buffer and multiplied the total number of events moved between threads by 3 to calculate to the total "ops".
Test Results (Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz)
Disruptor:
Railway/Train:
Clearly this approach has some merit. I am considering adding this to main Disruptor distribution. However, there are still a couple of design issues I need to work through first.
The sequence value passed to the handler is the value from the source ring buffer and may go up and down as the events flow through and I'm not sure if this will be an issue for users. It feels like a fairly minor thing, so hopefully I won't need to worry about it. The event processor only supports yielding at the moment. I need to figure out how to correctly include the wait strategies as there will be some niggles with the blocking wait strategy and consumer wake up.
This is an idea that we mentioned on the mailing list some time ago and I initially wasn't particularly interested, but given the appreciable benefit from this approach I think I will take the idea more seriously.
To test this I've created the ThreeToThreeSequencedThroughputTest that start three producers and one consumer. One of the aspects of the code supplied in the InfoQ article is that each train has a capacity of three longs and one "op" is measured as the movement of one long. To make the test more comparable I used an array of three longs as the entry within the ring buffer and multiplied the total number of events moved between threads by 3 to calculate to the total "ops".
Test Results (Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz)
Disruptor:
Run 0, Disruptor=390,738,060 ops/sec Run 1, Disruptor=387,931,034 ops/sec Run 2, Disruptor=397,058,823 ops/sec Run 3, Disruptor=394,160,583 ops/sec Run 4, Disruptor=396,767,083 ops/sec Run 5, Disruptor=394,736,842 ops/sec Run 6, Disruptor=396,767,083 ops/sec
Railway/Train:
ops/sec = 243,141,801 ops/sec = 302,695,445 ops/sec = 283,096,862 ops/sec = 273,670,298 ops/sec = 268,340,387 ops/sec = 264,802,500 ops/sec = 262,258,028
Clearly this approach has some merit. I am considering adding this to main Disruptor distribution. However, there are still a couple of design issues I need to work through first.
The sequence value passed to the handler is the value from the source ring buffer and may go up and down as the events flow through and I'm not sure if this will be an issue for users. It feels like a fairly minor thing, so hopefully I won't need to worry about it. The event processor only supports yielding at the moment. I need to figure out how to correctly include the wait strategies as there will be some niggles with the blocking wait strategy and consumer wake up.
This is an idea that we mentioned on the mailing list some time ago and I initially wasn't particularly interested, but given the appreciable benefit from this approach I think I will take the idea more seriously.
5 comments:
What kind of back-off strategy are you thinking of using? For a SPSC queue some combination of busy spin with pause (__mm_pause in gcc, don't know the Java equivalent), exponential back-off + sleep, or even yield works fine. None of them seem ideal when doing a round-robin between multiple queues.
At the moment it just does a yield(). I don't think that I will use a specific back off strategy for this case. If I can reuse the existing WaitStrategy interface in the right way, then you can the the PhasedBackOffStrategy that is already provided by the Disruptor. Note at LMAX in production we either use blocking, sleeping or yielding and don't make use of back off at this point in time.
As an aside, there is no equivalent of __mm_pause in Java. It is on my wish list.
How does this approach compare to a single Disruptor with multiple producers, and a consumer which does not round-robin? It seems like this approach ought to faster, especially for the writers.
It's about an order of magnitude faster than the single Disruptor approach if the number of publishers is small (<10). As each new producer requires a new instance of the ring buffer, it doesn't scale as well from a memory use perspective.
Post a Comment