Saturday 11 April 2020

I Heard a Rumour...

Where Aeron catches up on the goss.

The Need For Naming

A few months ago a pull request appeared on Aeron's Github site that added the ability to request Aeron to resolve or re-resolve host names to IP addresses.  In cloud environments, especially when using Kubernetes, when nodes fail and restart it is not uncommon for a node with the same host name to restart with a different IP address.  Unfortunately for Aeron this could make life difficult as it would resolve IP addresses up front and stick with it for the life time of the media driver.  This is particularly bad when we consider nodes that are part of Aeron Cluster, where we expect nodes to come and go from the cluster over time.

It became very clear that we needed a plan that would allow Aeron to use logical names instead of IP addresses as endpoint identifiers and re-resolve those addresses appropriately.  We didn't end up using the supplied pull request and came with an alternative solution that was a better fit with some of Aeron's longer term goals (I say we, it was mostly Todd Montgomery - I just did the C port).

As DNS can often be a source odd network latency issues, we didn't want a name resolution solution that was entirely reliant on default system name resolution.  So we have also included a mechanism for resolving names that works entirely within Aeron.


The first thing we needed to tackle was re-resolving IP addresses when peer nodes went away and came back with a different address.  Fortunately we already have a indicators within the existing protocol that allows the media driver to detect when nodes have died.  Aeron continually sends data frames or heartbeats (sender to receiver) and status messages (receiver to sender) during normal running.  We can use the absence of these messages as a mechanism to detect that a node (that is identified by name rather than IP address) needs to be re-resolved.

Periodically the sender and receiver will scan their endpoints to see if any having been missing regular updates and if those endpoints were identified by a name, trigger a re-resolution.  The simple solution here would be to in-place re-resolve the name to an address, e.g. using getaddrinfo.  However, one of the reasons that Aeron is incredibly fast (did I mention that already) is that it has a very principle based approach to its design.  One of the principles is "Non-blocking IO in the message path".  This is to avoid any unusual stalls caused by the processing of blocking IO operations.  The call to resolve a host name can block for extended periods of time (BTW, if you are ever using an app on an other fast machine and it stalls for weird periods of time, it is worth asking the question, is it DNS causing the problem).  Therefore we want to offload name resolution from our sender and receiver threads (the message path) onto the conductor where we can perform the slower blocking operations.

Name Resolvers

It was apparent very early on that we could could make the resolution of names an abstract concept.  Obviously using DNS and host names is the most obvious solutions, but it would be interesting to allow for names to come from other sources.  E.g. we could name individual media drivers and use those names with our channel configuration.  This allows a couple of neat behaviours.  All of the configuration for naming can be self contained within Aeron itself independent of DNS, which may require configuration of a separate system and we could also allow names to resolve to more that just IP addresses, e.g. host and port pairs or maybe direct to MAC addresses* in the future.

* Bonus points if you can figure out why this might be useful.

To support this in both the Java and C media drivers have the concept of a name resolver, with 2 production implementations, default (host name based) and driver where the media drivers are responsible to managing the list of names.  With the driver based name resolution we need a mechanism to communicate the names between the instances of the media driver across the network.

Enter the Gossip Protocol

To allow driver names to propagate across the network, Aeron supports a gossip-style protocol, where we have an additional frame type (resolution frame) that contains mappings of names to addresses.  Currently, only IPv4 and IPv6 addresses are supported, but there is scope for adding others later.

To make this work, for each media driver we specify 3 things.  The name for the media driver (this will default to the host name when not specified), a bootstrap neighbour to send initial name resolutions to and a resolver interface.  The most important option is the resolver interface as specifying this will enable the driver name resolution.  It also determines which network interface to use to send and receive resolution frames and is the address reported to the neighbors for self-resolutions.  This can also be a wildcard address (e.g., in which case the neighbors will use the source address of the received resolution frames to identify that node.

On start each of the nodes will have an empty set of neighbour nodes and a bootstrap neighbour.  Every 1s the driver name resolver will send out a self resolution, i.e. tell all the nodes that it knows about, what its own name and address are.  This will be sent (via UDP) to all of its known neighbour nodes and the bootstrap node (if not already in the neighbour list).  Because the neighbour list is initially empty, then messages will only be sent to bootstrap neighbours on the first pass.  The bootstrap neighbour can be specified using a host name and the driver name resolver will ensure that it is re-resolved periodically in case it too has died and come back with a different IP address.

As a result of this the driver name resolvers will start to receive resolution frames.  The name/address entries from these frames will be added to a cache and the neighbor list.  If the resolution frame has come through as a notification of a self resolution we update a last activity timestamp for that node.

Every 2s, the media driver will send its cache of name/address pairs to all of its neighbours, so eventually all of the nodes will know about all of the other as the name/address entries are shared around the cluster.  At the higher layer the conductor when trying to resolve a name to a supplied address on a channel URI will call the driver name resolver first, which can resolve the name from its cache, handing off to the default resolver if not found.

Periodically the cache and the neighbor list will be checked to see if we are still receiving self resolutions for a particular node.  If the last activity timestamp hasn't been been updated recently enough then the entries are evicted from the cache and neighbour list under the assumption that the neighbour has died.

All of this is happening on the conductor thread so that it will not impact the performance of the sender and the receiver.  This is primarily designed for small clusters of nodes as all nodes will be gossiping to all other nodes once the resolutions have propagated across the network.  It is not designed for large scale system wide name resolution.  However, it is a very new feature and we will expect to evolve over time as users experiment with it.

Write your own

With a lot of the algorithms within Aeron it is often not possible to pick a single implementation, so we offer the ability to provide your own implementation (e.g. flow control, congestion control).  Name resolution fits into that model as well.  There is an interface for the Java Driver and a function pointer based struct on the C driver that can be implemented by a user.  So if there is a custom name resolution strategy that you would prefer to use, it can be plugged in quite easily.

If you look carefully, you notice that there is a 2-phase approach to resolving a name.  There is lookup method and a resolve method.  The lookup method takes a name and returns a host name, UDP port pair, e.g. '', where as the resolve function takes in the host name portion of that pair and returns an internet address.  The additional param name is so the resolver can distinguish between an endpoint and a control address.


While perhaps not a ground-breaking feature, it is a useful one.  It manages to provide the convenience of support name-based resolution without compromising on the latency goals of Aeron.   It is supported in both the Java (1.26.0) and C (1.27.0) media drivers.  Feedback is always welcome and check out the wiki for more information.

No comments: