The first delivery for Edge Storage was the creation of a short term distributed cache to be leveraged by all applications deployed in the network. Each VPC queues and distributes cache results proportionally across every available Host, reducing the number of Hosts and compute units required for your application to scale.
Image A: Picture of 6 Hosts, with 3 hosting an application and all 6 storing cached results for the application
🔗Caching at the Gateway
The logical place for a short term cache is arguable on Gateways (the termination point of requests within the network), but this just doesn’t scale. Let’s imagine you have a single Gateway with 250 Hosts, a reasonable number on the network. The Hosts are each responding to hundreds of requests per second and it’s not abnormal, especially when dealing with rich media, for each payload to be ~100kb. That’s a lot of results to store in a cache, in memory, on a single machine.
Entirely removing the cache from Gateway creates a potential vulnerability when a high volume of requests for a single resource causes an unnecessary flood in the Gateway queue. To protect against this, Gateway operates the lightning cache, a resource specific in-memory cache that adjusts TTL in real time based on the frequency of requests across a moving window, with consideration for available memory and queue size to further optimize the balance between delivery speed and load distribution.
Image B: Gateway cache showing different sized caches based on traffic
By using the same distribution and replication protocol as the cache we were able to facilitate a persisted long term object store, with the added option to distribute and coordinate outside of a single VPC. Though this may appear as a natural progression, long term object storage comes with a fresh set of requirements, most notable the issue of resilience.
Whilst cache corruption or invalidation due to service disruption can cause a significant CPU spike, the data is by nature ephemeral so nothing is lost that can’t be rebuilt. The same is not true for longterm object storage which added a fresh set of considerations to our approach.
Files and cached results come in all shapes and sizes, and so do the machines in the network. Because of this the method of distribution had to be flexible enough to work with the smallest possible device footprint. A machine with a spare 20mb might not seem significant, but if you consider just how many smart devices with SSD or NVM (Non-Volatile Memory) offering fast I/O and infrequent power cycles exist in the average home, the redundancy quickly becomes significant. By breaking up files, the network avoids dismissal of a swarm of low powered contributors, so chunking became an obvious requirement.
Chunk size is another important consideration. Just like typical volume formatting, the optimum size will balance maximised space utilisation and minimised read operations. The current working splits files into 512KB chunks. Smaller files are not chunked, unless a customer specifically requests that an object is secured through forced chunking (see below).
Image C: Gateway shredding files into chunks
After chunking the file, Gateway stores a file hash in its in-memory database before adding entries to its cache queue, a list of distribution jobs based on the minimum replication requirements. Connected Hosts take the file chunks and persist them locally, as well as indexing in an in-memory database. The Gateway persists data in the cache queue until the replication conditions are satisfied, at which point it retains an index but destroys the parts.
Image D: Indexing files as they’re chunked
When the Gateway receives a request for a file it creates an entry in the edge queue for each chunk it needs to rebuild the source. Connected Hosts work together to return all of the required parts before the Gateway returns the payload to the client.
Image E: Rebuilding from a checklist of chunks
🔗Forced chunking on sensitive data
Not all files that we share in the network are sensitive and not all sensitive files are large enough to be chunked. That’s why we introduced the option to force chunking on files that should never exist in their entirety on a single Host.
Image F: Forcing small files to be split when required
Hosts can go offline at any time so to manage any reduction in replication, Gateway stores an index of Host IDs alongside each hash to track the distribution of each chunk. Gateway monitors the health of Hosts via Consul, the distributed network manifest, and should one become unhealthy, the Gateway identifies entries in the hash database that now require redistribution. For each hash, it fetches the chunks from connected Hosts and creates a new entry in the cache queue, thus triggering the replication process.
🔗What happens when a Gateway goes offline?
Whilst the index of hashes appears lost, all of the existing Hosts collaboratively hold all of the hashes and data, so on reconnection to a new Gateway they can quickly begin the rebuild process. The Hosts initialise the process of rebuilding by sending a list of their chunks. Filtering and removing those that it already holds, the Gateway inserts an entry into its hash database along with the ID of the Host from which the list of hashes originated.
After the reindex process is completed or the timeout is exceeded, the Gateways redistribution process takes over to guarantee a satisfactory level of replication.
Image G: Hosts moving from an offline GW to a new GW and rebuilding the index
Edge storage provides a resilient, secure and extremely fast storage solution for caching as close to the point of need as possible and for long term object storage. It is now live in Mainnet and will be moving in to testing next week. We expect to have the first public beta available within the next quarter.