What is Maglev? A Thread.
tl;dr: Maglev provides HA for network load-balancers.
If you are in the cloud, then you are likely already using it. This is how Google and others make load-balancing reliable and scalable with commodity Linux servers.
tl;dr: Maglev provides HA for network load-balancers.
If you are in the cloud, then you are likely already using it. This is how Google and others make load-balancing reliable and scalable with commodity Linux servers.
Original paper:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf
tl;dr:
- Avoid special hardware needs
- Simple ECMP + Linux Machine + Maglev = Replace expensive hardware LB
- In use at Google since 2008
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf
tl;dr:
- Avoid special hardware needs
- Simple ECMP + Linux Machine + Maglev = Replace expensive hardware LB
- In use at Google since 2008
The problem 1/3:
Running any service at scale requires a load-balancer. Each LB will hash or round-robin and select a destination. For scale and HA purposes, you will need multiple (many) load-balancers. All packets of a network connection need to end up at the same destination.
Running any service at scale requires a load-balancer. Each LB will hash or round-robin and select a destination. For scale and HA purposes, you will need multiple (many) load-balancers. All packets of a network connection need to end up at the same destination.
The problem 2/3: You can 1) require all packets of a connection to always be handled by the same load-balancer -> the same decision will be made 2) ensure that all load-balancers make the *same* decision.
The problem 3/3: Assuming you do 1). When a load-balancer dies. You *have* to re-balance to another one. The new LB will likely make a different decision. The connection dies. Users get
.

The fix: Maglev.
All load-balancers in a fleet make the same decision on where to load-balance to with a very high probability. You control the probability. You can put a load-balancer on
and TCP connections will survive (
). This is called consistent hashing.
All load-balancers in a fleet make the same decision on where to load-balance to with a very high probability. You control the probability. You can put a load-balancer on


The bonus:
Because all load-balancers make the same decision. The complexity of steering all packets of a connection to the same load-balancer can be removed. All packets can go to any load-balancer. Simple ECMP can be used. The network becomes simple and easy to manage.
Because all load-balancers make the same decision. The complexity of steering all packets of a connection to the same load-balancer can be removed. All packets can go to any load-balancer. Simple ECMP can be used. The network becomes simple and easy to manage.
Further reading:
Cilium 1.9 added support for Maglev so you can easily implement Kubernetes services of type LoadBalancer with Maglev. https://cilium.io/blog/2020/11/10/cilium-19#maglev-load-balancing
Cilium 1.9 added support for Maglev so you can easily implement Kubernetes services of type LoadBalancer with Maglev. https://cilium.io/blog/2020/11/10/cilium-19#maglev-load-balancing