Thursday 19 March 2020

Flow Control in Aeron

One of my more recent projects has led me to become more involved in the Aeron project.  If you are unaware of Aeron, then head over to the Github site and check it out.  At its core is an reliable messaging system that works over UDP, Multicast UDP and IPC.  It also contains an archiving feature for recording and replay and (still under active development) an implementation of the Raft protocol for clustering.  Did I mention that it was fast too.

I've spent the last few weeks buried in the various strategies the Aeron has for flow control.  Specifically modifying the existing flow control strategies and adding more flexible configuration on a per channel basis.  Before I jump into that it would be useful to cover a little background first.

What is flow control?


Within a distributed system the purpose of flow control is to limit the rate of a sender so that is does not overrun it's associated receiver.  UDP does not come with any form of flow control, therefore it is easy to create a sender that will out pace the receiver, leading to message loss.  There are a number of different forms of flow control, but I'm going to focus on the sliding window flow control protocol used by TCP and Aeron. The sliding window protocol requires that the sender maintain a buffer of data (referred to as a window).  The size of this window will typically communicated from the receiver to the sender as part of the protocol.  With a bi-directional protocol like TCP the size of the window is communicated in each TCP segment header.  This is the amount of data that the sender can transmit to the receiver before having to wait until an acknowledgement is received.  If the application thread on the receiver side is busy and does not read the data from the socket and the sender continues to transmit, the window size value will decrease until it reaches 0, at which time the sender must stop and wait for an acknowledgement with a non-zero window size before sending again.  There is a lot more networking theory around sizing the flow control window in order to get full utilisation of the network.  But I will leave that as an exercise for the reader.

With Aeron and UDP unicast it is very similar to TCP, however Aeron is a unidirectional protocol where the receivers send status messages to indicate to the sender that it is ready to receive data and how much.  The status message indicates where the subscriber is up to using the consumption term id and consumption term offset for a specific channel/stream/session triple.  The receiver window value is the amount of data that can be sent from that position before the sender needs to stop and wait for a new status message indicating the the receiver is able to consume more data.  The size of the receiver window is at most of ¼ of the term size and at least the size of the MTU (maximum transfer unit).

However, one of the neat features of Aeron is that it supports multicast (and multi-destination-cast, for which the same rules will apply), where there are multiple receivers for the same publication.  In this situation how do we determine what values should be used for the flow control window?  This is a question that has no one right answer, so Aeron provides a number of configuration options and it is also possible to plug in your own strategy.

In fact Aeron is the only tool that supports UDP multicast messaging with dynamic flow control (that we're aware of).

Max Flow Control


The simplest and fastest form of multicast flow control is a strategy where we take the maximum position of all of the receivers and use that value to derive limit that the sender can use for publication.  This means any receivers that are not keeping up with the fastest one may fall behind and experience packet loss.

Min Flow Control


This is the inverse of the max flow control strategy, where instead we take minimum of all of the available receivers.  This will prevent slower nodes (as long as they are still sending status messages) from falling behind.  However this strategy does run the risk that the slower nodes can hold up the rest of the receivers by causing back pressure slowing the publisher.  Because this strategy needs to track all of the individual receivers and their positions, it also must handle the case that a node has disappeared altogether.  E.g. it has been shutdown or crashed.  This is handled via a timeout (default 2s, but configurable).  If status messages for a receiver have not been seen that period of time, that receiver is ejected from the flow control strategy and the publisher is allowed to move forward.

Tagged Flow Control (previously known as Preferred Flow Control)


Tagged flow control is a strategy that attempts to mitigate some of the short comings of the min flow control strategy.  It works by using a min flow control approach, but only for a subset of receivers that are tagged to be included in the flow control group.  The min flow control strategy is a special case of this strategy where are all receivers are considered to be in the group.

Configuring Flow Control


One of the new features that came with Aeron 1.26.0 was the ability to control the flow control strategy directly from the channel URI allowing for fine grained control over each publication and subscription.  Defaults can also be specified on the media driver context.  On the publication side the channel can be specified as:

The min and max flow control settings for the publication are the simplest, but the tagged one starts to get a little bit interesting.  The ,g:1001 specifies that the group tag is 1001 and any receiver that want to be involved in flow control for this publication will need to specify that group tag.  The subscription channel URI show how to ensure that the receiver sends the appropriate group tag so that it will be included in the publishers flow control group.

The tagged flow control strategy is really useful for receiving from a channel where there are a number of different types of subscribers that have different reliability requirements.  A good example is where there is a flow of events that needs to go to a gateway service to be sent out to users, perhaps via HTTP and also needs to go to a couple of archiving services to store the data redundantly in a database.   It may be possible for the gateway nodes to easily deal with message loss, either by reporting an error to the user or re-requesting the data.  However it may not be possible for the archiving service nodes to do so.  In this case the publication would specify the tagged flow control strategy and the subscriptions on the archiving services would use gtag parameter to ensure that they are included in the flow control group.  The gateway services could leave the gtag value unset and not impact the flow control on the publisher.

While being able to include just the important subscribers into a flow control group so that they aren't overrun by the publisher is useful, there would still be an issue.  If both of our archiving services happened to be down eventually their receivers would be timed out and removed from the group.  Wouldn't it be great if we could require that a group contain a certain number of tagged receivers before the publication can report that it is connected.  That way we could ensure that our archiving service nodes were up before we started publishing data.

Flow Control Based Connectivity


Turns that this is also now possible with the release of 1.26.0.  For both the tagged flow control and the min flow control strategies we can specify a group minimum size that must be met before a publication can be considered connected.  This is independent of to the requirement that there needs to be one connected subscriber.  Therefore the default value for this group minimum size is 0.  Like the strategy and the flow control group, the group minimum size can be specified on the channel URI.

In both of these cases the group minimum size is set to 3.  For the min flow control strategy we would need at least 3 connected receivers, for the tagged flow control strategy we would need at least 3 connected receivers with tag 1001 and any receivers without the tag are disregarded.

Time Outs


One last new feature available on the channel URI configuration is the ability to specify the length of the timeout for the min and tagged flow control strategies.  As mentioned the earlier this will default to 2s, but can be set to any value.  Some care should be taken in specifying this value, if it is too short then receivers may frequently timeout during normal running.  Status messages are emitted at least once every 200 ms (more if necessary), so any shorter than that would not be useful.  Too long and a failed receiver could result in a significant back pressure stall on the publisher.  Setting this for min and tagged flow control strategies:


Summary


As mentioned earlier the idea of using flow control to provide dynamic back pressure for a multicast messaging bus is a unique and powerful feature of Aeron.  Being able to configure these settings on a per publication provides a an extra level of flexibility that to help our users to build the system that they need.  Any questions, come over to Gitter channel and chat to us.

No comments: