In a previous article we talked about our plans to move from ZooKeeper (the network manifest protocol originally in use in the network) to Consul, a more scalable and flexible network-wide manifest.
Architecturally speaking, the network manifest is relatively straight forward. All network services connect to Consul, and based on a distributed instruction set, make a decision on which peers to interact with and which customer applications to run.
The move from ZooKeeper to Consul was predominantly driven by performance, but with an understanding that Consul opened up important opportunities that would be explored further at a later stage. I’ll try and cover those now.
🔗First approach: persist everything, worry later
We store the manifest in two places. Services which is a catalogue of connected machines and KV, or _Key Value_, which is an index of objects structured like a filesystem.
In the first iteration of our integration with Consul, Host, Gateway or Stargate registered a Service on the network and wrote metadata to KV. We persisted this data - all of it - which meant that services all appeared to be online, even if they weren’t. And their metadata was preserved indefinitely.
This posed a number of issues, including the exponential growth of data footprint and the complexity in distinguishing between connected and disconnected devices. To patch this problem we had all applications periodically write health data including a last active timestamp to determine whether a machine was likely to still be connected. We then used this data to prune services that were no longer online. This was a fairly heavy handed approach. It worked perfectly on testnet, but when we migrated it to mainnet we saw an impact on latency.
Those that have used Consul will know that it ships with a check component. After realising that the original approach had scaleability issues, we moved to attaching health checks to services. There are two types of healthchecks in use: 1. gRPC for Gateway and Stargate - which both require public IPs - allows the Consul service to use the standard gRPC healthcheck protocol and communicate directly with the device; and 2. TTL checks, which are useful when the device is not publicly available.
Rather than Consul contacting the device the device contacts Consul, initially to provide a definition of the healthcheck, and then periodically to keep to these conditions.
With checks attached to the Service we no longer need to prune them. Consul was instructed to dismiss and remove any service with failing checks.
🔗Persisted KV: everything everywhere
The first iteration of the Network Explorer, which was used internally, did not store device data and was only able to display metadata available to it through Consul. This didn’t immediately cause any issues and the index size remained relatively small, but after deploying to mainnet we noticed some potential restrictions in the design.
When a KV entry is updated in Consul it needs to propagate through the network and achieve a consensus before it can be updated again. Where topologically the Consul services are in close proximity and latency is low, the issue is manageable. Occasionally a periodic update might take a few milliseconds more, but who’s counting? When you scale up and out and the number of replications for consensus increases along with latency, things start to get uncomfortable.
This led us to question why a Stargate in Las Vegas should care about the Hosts that are connected to a Gateway in Botswana – whether any data should be distributed by default outside of what is effectively a series of VPCs. Consul allows each Stargate to specify the key for its datacenter, and up until now they were all using dc1.
🔗Ephemeral KV: housekeeping in the network
We refactored Stargate to operate as its own Consul datacenter, but remain connected to allow ACL and other distributed indexes to remain intact. Host can be run on all sorts of devices, and as hardware is increasingly portable we must allow for a changes in location over time. This means that a single device running Host could potentially be writing KV data to different Stargates – and therefore a different datacenter – every day. (Because, you know, we’re all aspirational jet-setters :) ) Therefore we need to manage what happens when a Host migrates. We don’t want to leave a legacy of metadata in each Stargate, but we do want the Network Explorer to remember the device. Tricky, right?
There are ways to make Consul KV ephemeral, and the most reliable is by using sessions. Sessions are attached to health checks which are in turn attached to services. This means that a failing health check can simultaneously deregister a service and invalidate a session. When writing to KV it is possible to attach a session key to the write operation and create a lock on the data, as well as instructing Consul to deregister the KV data at the point that the session terminates.
🔗Replicating data across data Stargates
Now that devices are no longer persisted and their data is both ephemeral and confined to a single Stargate, the Network Explorer’s job becomes more complicated.
It needs an eye on the entire network, but when dealing with multiple Stargates the challenge is to unify all Services and KV. Replicate is Consul application designed specifically for this job. By defining the source Stargates and KV prefix’s, we are now able to run a single Consul service alongside the Network Explorer to provide a single version of the truth without forcing Stargates in mainnet to distribute their own data.
🔗Persisting offline nodes
Hosts require a minimum of 40% uptime, so frequent periods of downtime are inevitable. Now that data is ephemeral, when a Host disappears, so does its metadata. To combat this, we introduced a data store outside of Consul for the persistence of device metadata which helps retain clarity and a full manifest of registered hardware, regardless of state. In time this data store will become directly addressable and explorable.
We’ve reduced global replication and alleviated the latency of a global consensus, introduced global data housekeeping through ephemeral session-based KV and persisted valuable metadata outside of the network manifest for a long term view on resource allocation and infrastructure.
The impact on mainnet has been significant, with a high reduction in latency in what was already a lightning fast network.