🔗First, some background information about DNS and BGP
We use BGP for DNS which means we can have a number of Stargates that live under a single subnet IP. When your browser requests a DNS record from the network, BGP directs it to the closest Stargate so that the response is returned as quickly as possible. BGP has been around for a long time, and is reliable enough to make this process fairly straight forward.
🔗Where BGP doesn’t work well
Stargates are responsible for network coordination as well as DNS which means the same device that talks to Hosts and Gateways is also responding to DNS. Because of this, we initially decided to let Gateways and Hosts find their closest Stargate in the same way the DNS requests I mentioned find the closest Stargate - BGP.
BGP is reliable, but not entirely accurate. Sometimes a request made in Singapore will be resolved by the Stargate in Frankfurt. This is okay for DNS lookup, perhaps adding 10ms to the response time, which is less than the impact of moving from a wired connection to WiFi. However, when using this lookup method for Gateways and Hosts to find their closest Stargate, the issue is greatly exaggerated. If a Gateway in Singapore connects to a Stargate in Frankfurt, the Hosts that connect to it are likely to also be in Frankfurt, meaning all of those images for Content Distribution are travelling a great distance, and that’s something that can occasionally cause a timeout, or 504 Gateway Timeout error that some of you have seen.
Big corporations like Google have the same issue and the way they resolve it is by buying up a lot of the internets’ backbone at a huge cost, but for the rest of us this isn’t really a sound resolution. BGP just isn’t accurate enough to be relied upon for service discovery, so what we are deploying today is a more innovative fix for this.
When a Gateway looks for a Stargate it no longer uses BGP, but instead requests a single DNS record from the Stargate that BGP reports as the nearest, which contains a list of IP addresses for all other Stargates. It takes these IP addresses and runs some connectivity checks. These checks allow the Gateway to confirm which Stargate is the closest to it in terms of connectivity.. Once they completed, the Gateway connects directly in to the chosen Stargate.
For those who have asked “Why does X Gateway connect to Y Stargate when Z is closer?”, now you know.