Thread by @tgraf__, What is Maglev? A Thread.tl;dr: Maglev provides HA for network load-balancers.If you [...]

Thomas Graf

tgraf__

What is Maglev? A Thread.

tl;dr: Maglev provides HA for network load-balancers.

If you are in the cloud, then you are likely already using it. This is how Google and others make load-balancing reliable and scalable with commodity Linux servers.

Original paper:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf

tl;dr:
- Avoid special hardware needs
- Simple ECMP + Linux Machine + Maglev = Replace expensive hardware LB
- In use at Google since 2008

The problem 1/3:
Running any service at scale requires a load-balancer. Each LB will hash or round-robin and select a destination. For scale and HA purposes, you will need multiple (many) load-balancers. All packets of a network connection need to end up at the same destination.

The problem 2/3: You can 1) require all packets of a connection to always be handled by the same load-balancer -> the same decision will be made 2) ensure that all load-balancers make the *same* decision.

The problem 3/3: Assuming you do 1). When a load-balancer dies. You *have* to re-balance to another one. The new LB will likely make a different decision. The connection dies. Users get

The fix: Maglev.

All load-balancers in a fleet make the same decision on where to load-balance to with a very high probability. You control the probability. You can put a load-balancer on

and TCP connections will survive (

). This is called consistent hashing.

The bonus:

Because all load-balancers make the same decision. The complexity of steering all packets of a connection to the same load-balancer can be removed. All packets can go to any load-balancer. Simple ECMP can be used. The network becomes simple and easy to manage.

Further reading:

Cilium 1.9 added support for Maglev so you can easily implement Kubernetes services of type LoadBalancer with Maglev. https://cilium.io/blog/2020/11/10/cilium-19#maglev-load-balancing

Cilium 1.9: Maglev, Deny Policies, VM Support, OpenShift, Hubble mTLS, Bandwidth Manager, eBPF...

We are excited to announce the Cilium 1.9 release. A total of 2816 commits have been contributed by a community of 251 developers, ma...

https://cilium.io/blog/2020/11/10/cilium-19#maglev-load-balancing

You can follow @tgraf__.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: