The performance of network applications can be measured not only by the speed at which they resolve requests, but also by their ability to recover from faults outside of a reasonable scale of culpability. What if the hardware is power cycled during a long running operation, or a network outage causes the pending job handler to throw unreachability errors?
Graceful recovery can often be tricky to engineer and test, and in some cases can be detrimental to performance under stable conditions. Here are some examples of conditions that a Host might need to consider when rendering and returning a payload to the Gateway when a local network outage occurs:
- I’m midway through rendering. Should I continue?
- I’ve rendered and have a payload. Should I try and send it anyway?
- I can’t see if my software is up-to-date. Should I presume it is?
- I can’t see my current assigned Gateway. Does it have the same IP?
These examples of granular state conditions become perpetually more unmanageable as application complexity progresses. However if we approach the fundamental application operations and treat them as singularities, we suppress any notion of controlled recovery in favour of a fail fast approach. Stateful tasks become contextually transient, failure is fatal, and recovery is more like rebirth.
To explain how we achieve this, let’s start with an example:
🔗Pump and tank
In this example application we have a pump and a tank. The tank must be open for the pump to work, and the pump must be stopped when the tank is full.
Main
func main() {
// Create a new tank and pump
tank := new(Tank)
pump := new(Pump)
// If the tank opened successfully, pump the water
if err := tank.Open(pump); err != nil {
pump.Start()
go pump.PumpWater(tank)
}
}
Pump
const (
unitOfWater = 1
maxWater = 20
)
type Pump struct {
active bool
tank \\\\\\\*Tank
}
func (p \\\\\\\*Pump) Start() {
p.active = true
}
func (p \\\\\\\*Pump) PumpWater(tank \\\\\\\*Tank) {
// If the pump is active
for p.active {
tank.In(unitOfWater)
}
}
func (p \\\\\\\*Pump) Stop() {
p.active = false
}
Tank
type Tank struct {
pump \\\\\\\*Pump
open bool
level int
}
func (t \\\\\\\*Tank) In(w int) {
t.level += w
// If tank is closed, ask the pump to stop
if !t.open {
t.pump.Stop()
}
// If there is too much water in the tank, close it and stop pumping
if t.level >= maxWater {
t.Close()
t.pump.Stop()
}
}
func (t \\\\\\\*Tank) Open(pump \\\\\\\*Pump) error {
// Store the pump for reference and open the tank
t.pump = pump
t.open = true
if t.level >= maxWater {
return fmt.Errorf("Water level %d exceeds maximum inits %d", t.level, maxWater)
}
return nil
}
Creating peer components typically requires bidirectional referencing. The tank must be able to tell the pump to stop, and the pump must be able to tell the tank to open.
There are four states to be observed:
- Tank open
- Tank closed
- Pump active
- Pump stopped
If we abstract the four states into a a global set of states, the number of observable states is reduced:
- Piping water
- Tank full
If we then abstract the logic from the components we can remove all contextual references. This is where a Finite State Machine becomes an essential tool.
🔗What is a Finite State Machine?
A Finite State Machine – or FSM for short – is a programmatic map of whitelisted transitions, where each transition triggers a synchronous operation within the context of a State.
- State A can change to B
- State B can change to state C
- State C can change to states A and B
- State B cannot change to state A
Lets apply that logic to our previous example:
Main
func main() {
// Create an instance of our state machine
state := new(FSM)
// Create tank with state
tank := &Tank{
State: state,
}
// Create pump
pump := new(Pump)
// Create state for "\\\\\\\*" => "piping"
state.NewState().From("\\\\\\\*").To("piping").OnEnter(func(st \\\\\\\*fsm.State) {
// If the tank opened successfully, pump the water
if err := tank.Open(); err != nil {
pump.Start()
for !tank.Full() {
tank.In(pump.PumpWater())
}
}
})
// Create state for "piping" => "full"
state.NewState().From("piping").To("full").OnEnter(func(st \\\\\\\*fsm.State) {
// Stop the pump and close the tank
pump.Stop()
})
}
Pump
const unitOfWater = 1
type Pump struct {
active bool
}
func (p \\\\\\\*Pump) Start() {
p.active = true
}
func (p \\\\\\\*Pump) PumpWater() int {
// If the pump is active
if p.active {
return unitOfWater
}
return 0
}
func (p \\\\\\\*Pump) Stop() {
p.active = false
}
Tank
const maxWater = 20
type Tank struct {
State \\\\\\\*FSM
level int
}
func (t \\\\\\\*Tank) In(w int) {
// If the tank is at capacity transition state to "full"
if t.Full() {
t.State.Transition("full")
} else {
t.level += w
}
}
func (t \\\\\\\*Tank) Full() bool {
return t.level >= maxWater
}
func (t \\\\\\\*Tank) Open() error {
if !t.Full() {
return fmt.Errorf("Water level %d exceeds maximum inits %d", t.level, maxWater)
}
return nil
}
The state machine as a singleton is passed to each class, allowing them to instruct a state change whilst completely decoupled from their siblings. The core state definitions are clean and clearly bound to event driven states and this method of abstraction leaves us with a centralised instruction set for all machine states. It decouples the application components, meaning as the application scales in complexity we don’t fall into a dangerous trap of spaghetti code.
One other core aspect of the FSM is its ability to restrict transitions to whitelisted keys, stopping potentially harmful transitions from being triggered by asynchronous subroutines.
🔗Want to use it in your project?
I’m excited to announce that we’ll be releasing the Finite State Machine under the GNU General Public License this month as part of our journey to fully open sourcing the code base of the Edge network.
Keep an eye on https://github.com/dadi