Finite State Machine

The performance of network applications can be measured not only by the speed at which they resolve requests, but also by their ability to recover from faults outside of a reasonable scale of culpability. What if the hardware is power cycled during a long running operation, or a network outage causes the pending job handler to throw unreachability errors?

Graceful recovery can often be tricky to engineer and test, and in some cases can be detrimental to performance under stable conditions. Here are some examples of conditions that a Host might need to consider when rendering and returning a payload to the Gateway when a local network outage occurs:

I'm midway through rendering. Should I continue?
I've rendered and have a payload. Should I try and send it anyway?
I can't see if my software is up-to-date. Should I presume it is?
I can't see my current assigned Gateway. Does it have the same IP?

These examples of granular state conditions become perpetually more unmanageable as application complexity progresses. However if we approach the fundamental application operations and treat them as singularities, we suppress any notion of controlled recovery in favour of a fail fast approach. Stateful tasks become contextually transient, failure is fatal, and recovery is more like rebirth.

To explain how we achieve this, let's start with an example:

Pump and tank

In this example application we have a pump and a tank. The tank must be open for the pump to work, and the pump must be stopped when the tank is full.

Main

func main() {
 
  // Create a new tank and pump
  tank := new(Tank)
  pump := new(Pump)
 
  // If the tank opened successfully, pump the water
  if err := tank.Open(pump); err != nil {
    pump.Start()
    go pump.PumpWater(tank)
  }
}

Pump

const (
  unitOfWater = 1
  maxWater = 20
)
 
type Pump struct {
  active bool
  tank \\\\\\\*Tank
}
 
func (p \\\\\\\*Pump) Start() {
  p.active = true
}
 
func (p \\\\\\\*Pump) PumpWater(tank \\\\\\\*Tank) {
 
  // If the pump is active
  for p.active {
    tank.In(unitOfWater)
  }
}
 
func (p \\\\\\\*Pump) Stop() {
  p.active = false
}

Tank

type Tank struct {
  pump \\\\\\\*Pump
  open bool
  level int
}
 
func (t \\\\\\\*Tank) In(w int) {
  t.level += w
 
  // If tank is closed, ask the pump to stop
  if !t.open {
    t.pump.Stop()
  }
 
  // If there is too much water in the tank, close it and stop pumping
  if t.level >= maxWater {
    t.Close()
    t.pump.Stop()
  }
}
 
func (t \\\\\\\*Tank) Open(pump \\\\\\\*Pump) error {
 
  // Store the pump for reference and open the tank
  t.pump = pump
  t.open = true
  if t.level >= maxWater {
    return fmt.Errorf("Water level %d exceeds maximum inits %d", t.level, maxWater)
  }
  return nil
}

Creating peer components typically requires bidirectional referencing. The tank must be able to tell the pump to stop, and the pump must be able to tell the tank to open.

There are four states to be observed:

Tank open
Tank closed
Pump active
Pump stopped

If we abstract the four states into a a global set of states, the number of observable states is reduced:

Piping water
Tank full

If we then abstract the logic from the components we can remove all contextual references. This is where a Finite State Machine becomes an essential tool.

What is a Finite State Machine?

A Finite State Machine – or FSM for short – is a programmatic map of whitelisted transitions, where each transition triggers a synchronous operation within the context of a State.

State A can change to B
State B can change to state C
State C can change to states A and B
State B cannot change to state A

Lets apply that logic to our previous example:

Main

func main() {
 
  // Create an instance of our state machine
  state := new(FSM)
 
  // Create tank with state
  tank := &Tank{
    State: state,
  }
 
  // Create pump
  pump := new(Pump)
 
  // Create state for "\\\\\\\*" => "piping"
  state.NewState().From("\\\\\\\*").To("piping").OnEnter(func(st \\\\\\\*fsm.State) {
 
    // If the tank opened successfully, pump the water
    if err := tank.Open(); err != nil {
      pump.Start()
      for !tank.Full() {
        tank.In(pump.PumpWater())
      }
    }
  })
 
  // Create state for "piping" => "full"
  state.NewState().From("piping").To("full").OnEnter(func(st \\\\\\\*fsm.State) {
 
    // Stop the pump and close the tank
    pump.Stop()
  })
}

Pump

const unitOfWater = 1
 
type Pump struct {
  active bool
}
 
func (p \\\\\\\*Pump) Start() {
  p.active = true
}
 
func (p \\\\\\\*Pump) PumpWater() int {
 
  // If the pump is active
  if p.active {
    return unitOfWater
  }
 
  return 0
}
 
func (p \\\\\\\*Pump) Stop() {
  p.active = false
}

Tank

const maxWater = 20
 
type Tank struct {
  State \\\\\\\*FSM
  level int
}
 
func (t \\\\\\\*Tank) In(w int) {
 
  // If the tank is at capacity transition state to "full"
  if t.Full() {
    t.State.Transition("full")
  } else {
    t.level += w
  }
}
 
func (t \\\\\\\*Tank) Full() bool {
  return t.level >= maxWater
}
 
func (t \\\\\\\*Tank) Open() error {
  if !t.Full() {
    return fmt.Errorf("Water level %d exceeds maximum inits %d", t.level, maxWater)
  }
  return nil
}

The state machine as a singleton is passed to each class, allowing them to instruct a state change whilst completely decoupled from their siblings. The core state definitions are clean and clearly bound to event driven states and this method of abstraction leaves us with a centralised instruction set for all machine states. It decouples the application components, meaning as the application scales in complexity we don't fall into a dangerous trap of spaghetti code.

One other core aspect of the FSM is its ability to restrict transitions to whitelisted keys, stopping potentially harmful transitions from being triggered by asynchronous subroutines.

Want to use it in your project?

I'm excited to announce that we'll be releasing the Finite State Machine under the GNU General Public License this month as part of our journey to fully open sourcing the code base of the Edge network.

Keep an eye on https://github.com/dadi