Thread by @nicksdjohnson, Inspired by a conversation with @greg_colvin, time for a thread about the [...]

Inspired by a conversation with @greg_colvin, time for a thread about the EVM and one particular way it could be improved!

AKA, Nick gripes about bad EVM design decisions.
AKA, why "stack too deep" is a thing in Solidity.

The EVM is a stack machine - that is, it has a stack (let's call it the evaluation stack) and opcodes primarily work by popping items off the stack and pushing items onto it. The `ADD` opcode, for instance, pops two items off the stack, adds them together, and pushes the result.

This is in contrast to more common register-based VMs, which have a fixed set of registers, which opcodes operate on. So your `ADD` instruction might look something like `ADD A B C`, which is equivalent to A = B + C.

Why use a stack-based VM? They have a number of advantages: the bytecode is simpler (no operands specifying which registers to use). That makes VMs simpler to write, because instructions are easier to decode, and makes naive VMs faster, too.

It also makes compilers easier to write. Compilers for register-based machines have to allocate data (variables and intermediate results) to registers, which can be a complex task. They also have to handle running out of registers, while a stack is conceptually unlimited.

So why does anyone use register-based VMs? They're much easier to implement in physical hardware (each register becomes an actual component), and they can be more efficient than stack based VMs when used with a well-written optimising compiler.

Stack based VMs can spend a lot of time shuffling arguments around the stack, which register based VMs don't have to do. A register-based VM can access any register at any time, while a stack-based can only directly access the top elements of the stack.

The EVM facilitates this shuffling with special opcodes that shuffle data around the stack - DUPn (with n=1-16) duplicates the nth element from the top of the stack, while SWAPn swaps the top element with the nth element down. As you can imagine, these get used a lot.

In addition to registers, the EVM also has memory. This is laid out as a linear array starting at memory address 0, and extending more-or-less infinitely; contracts get charged for the highest address in memory that they read or write.

With all the preliminaries out of the way, we can talk about how Solidity stores temporary data, and how the EVM handicaps it.

Solidity stores local variables and function parameters in stack slots. And as we saw earlier, the EVM only allows access to the top 16 slots of the stack - this is where Solidity's local variable / stack-too-deep limit comes from.

You don't even get to have 16 locals, though! Those slots are also used up by intermediate values. Eg, `a * (b + c)` compiles to something like `<get A> <get B> <get C> ADD MUL` - by the time you get to `get C`, the two intermediate values mean you can only reach down 14 slots.

So, one way to fix this would be to make the EVM support access further down the stack - and there are proposals to enable this. But, there is another way!

You've probably noticed that traditional compilers don't suffer from this limitation. You can have more locals than your CPU has registers. This is because compilers for traditional architectures don't store locals in registers or the evaluation stack, they store them in memory.

Confusingly, this is also called 'the stack', but mechanically it's simply an area of the program's memory that is dedicated to storing local variables, function arguments, and related data like return addresses. Each function gets a 'frame', and variables are stored there.

Reading and writing memory is slower and more expensive than reading from registers or the built-in stack on a stack-based VM, so optimising compilers can 'lift' frequently used data out of memory and in to registers. The `i` in your inner loop probably never makes it to memory.

So, why don't we just do this on the EVM? Two reasons. First, the EVM's linear memory model gets in the way. Locals, arguments, and so forth aren't the only things we want to store in memory; we also have other data, like dynamically allocated structures.

This data isn't neatly scoped to the duration of a function call like locals are, so we can't store it with them - if we return from the function they were allocated in, they stop being valid. So we need a separate memory area to put them in.

Traditional architectures solve this by splitting memory into 'the stack' and 'the heap'. The heap contains dynamically allocated values - anything created using `new` or `malloc` in most languages - while the stack stores local variables, parameters, etc.

To avoid having to put a hard limit on the size of each of these memory areas, compilers put them at opposite ends of memory. The stack grows down from the top of memory (highest address) while the heap grows up from the bottom (lowest address) - or the reverse.

This works fine on traditional machines because memory is allocated in 'pages'. You can write to any location, and a page is created as needed to store the data there. The EVM's linear model makes this impossible, though - writing to the last address would use all your gas!

So, one thing we could do is reform the EVM's memory model. Instead of charging based on the highest address, store EVM memory in pages just like a traditional architecture, and charge based on how many pages are allocated. This is actually a pretty easy change!

With page-based EVM memory allocation, we could store locals in memory, and all would be well: we'd get rid of "stack too deep" errors, and get a bunch of other benefits besides - for example, it'd be possible to have pointers to local variables, and it reduces compiler workload.

But fixing the memory model isn't quite enough - we need one more thing. Storing locals on the stack is all very well, but you need to be able to access them efficiently. To load or store a local we need to know where in memory it is, and that depends on the depth of the stack.

Normally this is handled with a register called the "stack pointer"; it points to the top of the stack, and you load and store locals relative to it. The compiler puts your variable at, say, 4 bytes below the top of the stack frame, and so you can read and write it at SP-4.

We could emulate that on the EVM, but it introduces a lot of overhead. Either we'd need to store the stack pointer on the evaluation stack (confused yet?), or in memory, and either way there'd be lots of additional overhead fetching and working with it.

So, the final change we need to make is adding support for stack-indexed loads and stores. These would take the form of four new opcodes - two to set and read the stack pointer (these would be used at the start and end of functions), and two to load and store data relative to it.

So, to access a local or function argument, the compiler would use an operation like `MLOADSP` - load a value from memory relative to the stack pointer. This makes local variable access just two operations (the other is PUSHing the offset of the variable).

By reforming how we account for memory and adding these new opcodes, it'd be possible to write compilers that:
- Don't have local variable limits
- Are easier to write
- Waste less gas juggling stack elements around
- Enable new features like pointers to locals.

~

Latest Threads Unrolled: