Reversing and debugging EVM Smart contracts: How storage layout works? (Part 3)

In this new publication, we will look at how the different types of variable are stored and handled in the EVM memory and in storage.

Every time, when we are analyzing a piece of code, I advise you debug it with remix at the same time. You will get a better understanding of what’s happening. If you don’t know how to do it, please check the part 1 of this series about reversing smart contract

🔴 This is the 3rd part of our series about reversing and debugging EVM smart contracts, here you can find previous & next parts:

▸ 1. Simple example

We will start by using a very simple example.

Don’t forget to compile the contract bellow, with solidity 0.8.7 version and the optimizer to “200” runs.

Deploy it can and call the function “modify()”.

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;
contract Test {
uint balance;
uint balance2;
uint balance3;

function modify() external {
balance = 1;
balance2 = 2;
balance3 = 3;
}
}

When we debug with remix the “modify” function, we are directly “routed” to the function modify(), therefore code executed before modify() (Like the function selector or the payable verification.) is already done and useless to our analysis.

045 JUMPDEST |0x64cf33b8| (this is the function signature, we will discard it)
046 PUSH1 42 |0x42|
048 PUSH1 01 |0x01|0x42|
050 PUSH1 00 |0x00|0x01|0x42|
052 DUP2 |0x01|0x00|0x01|0x42|
053 SWAP1 |0x00|0x01|0x01|0x42|
054 SSTORE |0x01|0x42|
055 PUSH1 02 |0x02|0x01|0x42|
057 SWAP1 |0x01|0x02|0x42|
058 DUP2 |0x02|0x01|0x02|0x42|
059 SWAP1 |0x01|0x02|0x02|0x42|
060 SSTORE |0x02|0x42|
061 PUSH1 03 |0x03|0x02|0x42|
063 SWAP1 |0x02|0x03|0x42|
064 SSTORE |0x42|
065 JUMP ||
066 JUMPDEST ||
067 STOP ||

After we call the function modify, the result is pretty obvious.

At byte 48, the EVM PUSH 42 in the stack (66 in decimal), this the “address” in the code of the contract’s end. (066 JUMPDEST 067 STOP)
When the execution of modify() will end, the EVM will JUMP to this byte

Between instruction 48 and 54, the EVM SSTORE the value 1 in the slot 0
Between instructions 55 and 60 the EVM SSTORE the value 2 in the slot 1
Between instructions 61 and 64 the EVM SSTORE the value 3 in the slot 2

At byte 65, the function JUMP to 66 (0x42) the saved byte at the beginning of the function modify() and ends the execution of the smart contract by using the STOP instruction.

You can verify that by running the debugger and inspecting assembly in the stack. This code is equivalent to:

sstore(0x0,0x1) 
sstore(0x1,0x2)
sstore(0x2,0x3)

So even if our values take a lot less than 32 bytes, they are stored in separate slots which can cost some gas. (20.000 gas per slot if the value set was previously 0)

gas cost of the modify function

But, if you call the function a second time, as the values in storage are non-zero the gas cost will be a lot cheaper cheaper. (2200 per SSTORE)

Tip: Every instruction cost gas on the EVM, the gas cost of a transaction is the gas sum of all instruction (+the base cost of 21000gas) You can see gas usage, in the section “steps details” in the debugger tab:

Here, the SWAP1 instruction Takes 3 gas

If you don’t understand this first part, feel free to read the first or the second article in this series where i explain more in detail assembly: https://medium.com/@TrustChain/list/reversing-and-degugging-evm-smart-contracts-f4dd9195d07b

2. using uint8 instead of uint256

Until now, we have learnt nothing, we can already reverse that, but what if instead of uint we used uint8? Is there any difference? Let’s see the results!

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;
contract Test {
uint8 balance;
uint8 balance2;
uint8 balance3;

function modify() external {
balance = 1;
balance2 = 2;
balance3 = 3;
}
function modify2() external {
balance2 = 5;
}
}

You may already know, that uint8 uses only 1 byte. So 3 uint8 should use only 3 bytes, which are far less than a single slot. (32 bytes)

As a result, the three variables combined should use only one slot right?

YES you’re right! only one SSTORE is performed, and the code is much shorter.

045 JUMPDEST     |function signature (discarded)|
046 PUSH1 00 |0x00|
048 DUP1 |0x00|0x00|
049 SLOAD |0x00|0x00| (the slot 0 in storage contains 0x030201)
050 PUSH3 ffffff |0xffffff|0x00|0x00|
054 NOT |0xffffff...fffff000000|0x00|0x00| (the NOT inverse all 32 bytes of Stack(0)
055 AND |0x00|0x00|
056 PUSH3 030201 |0x030201|0x00|0x00|
060 OR |0x030201|0x00|
061 SWAP1 |0x00|0x030201|
062 SSTORE ||
063 STOP ||

Let’s see what happen precisely in this function. As always, don’t forget to use the debugger at the same time as reading, you’ll have a better understanding of the situation.

At byte 49, SLOAD load Storage in the slot Stack(0), but Stack(0) = 0 so the stack don’t change.

The 3 next operations (byte 50–55) are a bit mysterious:

The EVM push “ffffff” and NOT this which result in a 0xfffffffffffffffffffffffffffffffffffffffffffffffff000000 (the NOT instruction inverse all the bytes of the Stack(0).

But after that, it AND with the previous SLOAD, which was 0x00.

As we know that 0 AND x = 0 (for every x), the result is 0x00 and the Stack remain the Same as before the byte 50.

After these 6 instructions nothing has changed, this is very strange… We will see why in a couple of lines.

Just after that at byte 56: 0x030201 is pushed to the Stack, this is obviously our values of balance = 1, balance2 = 2 and balance3 = 3.

At byte 60 since Stack(1) is 0, OR opcode does nothing here beaucoup 0 OR x = x (for all x), the Stack remain the same only the 0x00 at Stack(1) has dropped.

After That, SSTORE is used to store 030201 in the slot 0. This is what we excepted.

You can note that 03 02 01 takes both 1 slot in the Storage and in the Stack as we excepted!
Therefore, we have the proof that these 3 variables take the same STORAGE slot, as a result less gas is used!

Only 43286 gas are used compared to 87504 before.

The second smart contract function modify(), uses only 43286 gas instead of 87504. If you’re a smart contract developer, you have the proof that, using less variables (when it’s possible) can save lot of gas…

Now let’s call the function modify2 (after modify()), here is the disassembly of the whole function :

As a quick reminder, the modify2 sets just balance2 (slot 1) to 5.

075 PUSH1 00   |0x00|
077 DUP1 |0x00|0x00|
078 SLOAD |0x030201|0x00| (Slot 0 = balance which contains 0x030201 as set previously)
079 PUSH2 ff00 |0xff00|0x030201|0x00|
082 NOT |0xfff...fffff00ff|0x030201|0x00|
083 AND |0x000...000030001|0x01|0x00|
084 PUSH2 0500 |0x0500|0x000...000030001|0x01|0x00|
087 OR |0x000...000030501|0x01|0x00|
088 SWAP1 |0x01|0x000...000030501|0x00|
089 SSTORE |0x00|
090 STOP ||
  1. At first (byte 78) the EVM load the slot 0 of the storage which is 0x030201
  2. Secondly (byte 79–82), the EVM NOT ff00, which in 32 bytes result is: 0xfffffffffffffffffffffffffffffffffffffffffffffffffff00ff.
  3. Thirdly (byte 83), it AND the 2 result, which is 0x00000000000000000000000000000000000000000000000000030001. (or 0x030001)
    This is the same as the storage slot 0, but without 02 (the “balance2” in the contract), this is NORMAL!
    It’s because in the modify2(), the EVM modify balance2. firstly it needs to erase the previous result without erasing balance and balance3 (as they are in the same slot) so it “cleans” the result by using the 0xfff..ff00ff mask
  4. After that 0500 is pushed to the stack (byte 84), and OR instruction (byte 85) is used in the 2 last results which ends in: 0x030501, the goal of the “OR” was to add 05 side by side of the 03 and 01. Therefore balance2 was modified successfully without altering balance and balance3.

The 0xfffffffffffffffffffffffffffffffffffffffffffffffffff00ff is called a “mask”.

If we wanted to instead balance2 modify balance3 to 5, we should use the mask 0xfffffffffffffffffffffffffffffffffffffffffffffffff00ffff (to erase the balance3’s byte in the 0x00 slot), and in the 4th step, we should PUSH 050000. (the 05 should be here, because here is placed balance3 in storage.)

This is why you should use smaller types when needed: it takes less gas.

But, don’t abuse smaller types because it increases the numbers of operations performed by the EVM (by using operations with masks), so it uses more gas.

3. Using different types of data

Let’s see if the trick of saving gas works only with uint types or with others solidity built-in types.

Here is the new smart contract, compile it with the same settings (0.8.7 and optimizer to 200)

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;
contract Test {
uint8 balance;
bytes4 data;
address addr;
function modify() external {
balance = 17;
data = 0xaaaaaaaa;
addr = 0x358AA13c52544ECCEF6B0ADD0f801012ADAD5eE3;
}
}

This smart contract contains an uint8 which make up for 1 byte, a byte4 which takes 4 bytes and an address which uses 20 bytes. In total the “modify” function we created “modify” 25 bytes, which is less than an EVM slot (32 bytes)

Theoretically, this 3 variables should fit in one sole storage slot.

Does it ?

045 JUMPDEST |function signature discarded|
046 PUSH1 00 |0x00|
048 DUP1 |0x00|0x00|
049 SLOAD |0x00|0x00| (slot 0 in storage contains 0)
050 PUSH1 01 |0x01|0x00|0x00|
052 PUSH1 01 |0x01|0x01|0x00|0x00|
054 PUSH1 c8 |0xc8|0x01|0x01|0x00|0x00|
056 SHL |0x00.50zeros.0100...00|0x01|0x00|0x00| move 0x01 to 0xc8 = 200times (50 hex numbers) to the left
057 SUB |0x00.15zeros0tttffff.49f.fffff|0x00|0x00|
058 NOT |0xfffffffffffffff00..49zeros..00|0x00|0x00| Our mask is created !
059 AND |0x00|0x00|
060 PUSH25 358aa13c52544eccef6b0add0f801012adad5ee3aaaaaaaa11
|0x00...00358aa13...|0x00|0x00|
086 OR |0x00...00358aa13...|0x00|
087 SWAP1 |0x00|0x00...00358aa13...|
088 SSTORE ||
089 STOP ||

As you may guess the answer is yes!

The structure of the function is almost the same as for the last example (with uint8). There are just some differences in the beginning for the creation of the mask.

As there are 25 bytes, so the mask should be 0xfffffffffffffff00000…49zeros00000 before the AND in byte 58/59.

After that, at byte 60 the 25 bytes containing values present in the smart contract are pushed (addr = 0x358aa13c52544eccef6b0add0f801012adad5ee3, data = 0xaaaaaaaa, balance = 0x11) and “ORed” with the mask at instruction 86.

Finally they are “SSTOREd” in the slot 0.

43298 gas are used (only 1 SSTORE are performed)

As excepted, the gas cost is about as low as for storing uint8. (43286 vs 43298)

▸ 4. Does the placement of variables count ?

To understand the EVM, the best way is the conduct the most possible tests as we can by modifying different parameters. This is exactly what we are doing here.

In this example, We will shift addr, and data variables.

They should occupy the same slot in storage, but places should be inverted right?

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;
contract Test {
uint8 balance;
address addr;
bytes4 data;
function modify() external {
balance = 17;
data = 0xaaaaaaaa;
addr = 0x358AA13c52544ECCEF6B0ADD0f801012ADAD5eE3;
}
}

Let’s compile and disassemble, at instruction 60 we have:
(Other instruction of the code are, of course, the same)

PUSH aaaaaaaa358aa13c52544eccef6b0add0f801012adad5ee311

This is different from last time: “358aa13c52544eccef6b0add0f801012adad5ee3aaaaaaaa11”.

We can see that the addr and data variable were interchanged.

Our hypothesis was true, the places of two variables were inverted. The balance = 17 (11 in hex) are still in the first place as excepted (First is last, because EVM use little-endian architecture.)

▸ 5. How are stored structs ?

If we do the same but with structs, what will happen?

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;
contract Test {
struct Values {
uint8 balance;
address addr;
bytes4 data;
}
Values value;
function modify() external {
value.balance = 17;
value.addr = 0x358AA13c52544ECCEF6B0ADD0f801012ADAD5eE3;
value.data = 0xaaaaaaaa;
}
}
Here is the full disassembly of the function045 JUMPDEST
046 PUSH1 00 |0x00|
048 DUP1 |0x00|0x00|
049 SLOAD |0x00|0x00|
050 PUSH1 01 |0x01|0x00|0x00|
052 PUSH1 01 |0x01|0x01|0x00|0x00|
054 PUSH1 c8 |0xc8|0x01|0x01|0x00|0x00|
056 SHL |0x00.50zeros.0100...00|0x01|0x00|0x00|
057 SUB |0x00.15zeros0tttffff.49f.fffff|0x00|0x00|
058 NOT |0xfffffffffffffff00..49zeros..00|0x00|0x00|
059 AND |0x00|0x00|
060 PUSH25 aaaaaaaa358aa13c52544eccef6b0add0f801012adad5ee311 |0x00...00358aa13...|0x00|0x00|
086 OR |0x00...00358aa13...|0x00|
087 SWAP1 |0x00|0x00...00358aa13...|
088 SSTORE ||
089 STOP ||

The code is exactly the same with a struct.

WARNING : This means that sometimes it may be tricky to differentiate between 3 different variables and a struct.

The gas cost is the same as before too:

Okay, it was too easy for us, now we need more challenge because it starts to be boring, there isn’t much difference between codes until now. But we’re not done!

▸ 6. What about arrays ?

How arrays are stored in the EVM ? Like structs or variables ?

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;
contract Test {
uint[] values;
// uint value2 in comment
function modify() external {
values.push(7);
values.push(8);
}
}

Nope !

This time, it’s unfortunately longer and more complex. We will need more theory about the storage in the EVM.

Basically, “values” variable is a dynamic array which can store up to an infinite number of values. But there is a question, what if we create a value2 variable after the array (like in comment)? In which storage slot should it be stored?

value2 need to come after “value” and be in slot 1, but as the “value” array is dynamic in can change size, so it’s pretty hard to assign a slot. Hum..

In reality this is still true, value2 will be stored in slot 1 (and thus next variables will be stored in slot 2,3 an so on…)

But what about slot 0 ? What contains slot 0 ?

In fact slot 0 stores the length of the array, in this case after the smart contract execution of the function modify() it will store 2 because the is 2 values pushed in the array.

But where the values are stored ?

As the length of the array is stored in slot 0, the values should be stored elsewhere.

It’s in the Slot Keccak256(0) + n

The first value are stored in the slot Keccak256(0)+0
The next : Keccak256(0)+1
The third : Keccak256(0)+2
The n-th : Keccak256(0)+(n-1)

Since keccak256(0) is a very big number, the EVM can’t run out off slots. As a result we solved this issue.

Why 0 in keccak256(0) ? Because the declared array is in the slot 0. If it was like

uint value2 //value 2 is in slot 0
uint[] values; //the array is in slot 1

As the array is in the slot 1, the i-th value in this array are stored in Keccak256(1)+(n-1). and the length of the array is stored in the slot 1

Now we should see how it works in assembly.
Firstly this is the first part of the function modify() in assembly:

048 PUSH1 00 |0x00|
050 DUP1 |0x00|0x00|
051 SLOAD |0x00|0x00|
052 PUSH1 01 |0x01|0x00|0x00|
054 DUP2 |0x00|0x01|0x00|0x00|
055 DUP2 |0x01|0x00|0x01|0x00|0x00|
056 ADD |0x01|0x01|0x00|0x00|
057 DUP4 |0x00|0x01|0x01|0x00|0x00|
058 SSTORE |0x01|0x00|0x00|
059 DUP3 |0x00|0x01|0x00|0x00|
060 DUP1 |0x00|0x00|0x01|0x00|0x00|
061 MSTORE |0x01|0x00|0x00|

At byte 51, the EVM loads from storage slot 0, the result is zero because the length of the array is 0.

At byte 56, the EVM add 1 to the value loaded in slot 0 and SSTORE it in the same slot at byte 58. The length of the array is thus now 1.

At byte 61, the EVM MSTORE 0 in the address 0, we will see why later.

062 PUSH1 07 |0x07|0x01|0x00|0x00|
064 PUSH32 290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e563 |hash|0x07|0x01|0x00|0x00|
097 SWAP3 |0x00|0x07|0x01|hash|0x00|
098 DUP4 |hash|0x00|0x07|0x01|hash|0x00|
099 ADD |hash|0x07|0x01|hash|0x00|
100 SSTORE |0x01|hash|0x00|

0x07 is pushed to the stack, same for the hash of 0, which is keccak256(0x00) (it equalts to 290decd…e563)

As the current index is 0 for the array, the EVM add 0 to the hash (at byte 99), the forth value in the stack. (this is important because when the EVM will add another value to the array the index will be different from 0 and thus the result will be different.)

And SSTORE 0x07 to the result slot. (Slot keccak256(0x00)+0 = 7)

This is the first element of the array witch was stored successfully.

After the 2nd SSTORE, the same code is repeated, to store the value 8 but with minor differences.

101 DUP3     |0x00|0x01|hash|0x00|
102 SLOAD |0x01|0x01|hash|0x00|
103 SWAP1 |0x01|0x01|hash|0x00|
104 DUP2 |0x01|0x01|0x01|hash|0x00|
105 ADD |0x02|0x01|hash|0x00|
106 SWAP1 |0x01|0x02|hash|0x00|
107 SWAP3 |0x00|0x02|hash|0x01|
108 SSTORE |hash|0x01|
109 PUSH1 08 |0x08|hash|0x01|
111 SWAP2 |0x01|hash|0x02|
112 ADD |hash+1|0x02|
113 SSTORE ||

At byte 102, SLOAD at slot 0 return 1 which is the length of the array.

At byte 105, ADD add 1 to the result, so 1 + 1 = 2.

At byte 108, the EVM SSTORE the result (which is 2) in the slot 0, this is the new length of the array.

At byte 112, the EVM add 1 to the hash, this is the assigned slot of values[1].

At byte 113, the EVM SSTORE the value 8, at keccak256(0)+1 instead of keccak256(0) last time.

If we analyses the Gas cost of such an operation, we see that, this is higher than without an array.

87746 gas

There was 1 SSTORE to store the length of an array. (+ 20.000 gas)
There were 2 SSTORE to store the values 7 and 8. (+ 40.000 gas)
There was 1 SSTORE to a non zero value slot to update the length of the array from 1 to 2. (+ 2.200 gas)

We’re almost done… And now, the final boss!

▸ 7. How are stored mapping ?

Let’s talk now about mappings, like arrays, it’s not obvious how to store all the values of the mapping. (the variable balances in ERC20 tokens is a good example)

We know that mapping are a set of key pointing to a value.

As for arrays, the formula for storing values in mapping is quite similar, the storage slot is equal to SHA3(mapping_slot . key)

(where . is the concatenation operator) In this last part we will verify this statement.

Let’s compile this last contract WITHOUT THE OPTIMIZER (but still solidity 0.8.7) and call the modify() function!

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;
contract Test {
mapping(address => uint) balances;
function modify() external { balances[0xbc5D291D2165f130375B94c62211f594dB48fEF2] = 15; balances[0x9a8af21Ac492D5055eA7e1e49bD91BC9b5549334] = 55;
}
}

Here is the full disassembly of the first part of the function :

054 PUSH1 0f |0x0f|
056 PUSH1 00 |0x00|0x0f|
058 DUP1 |0x00|0x00|0x0f|
059 PUSH20 bc5d291d2165f130375b94c62211f594db48fef2
|address1|0x00|0x00|0x0f|
080 PUSH20 ffffffffffffffffffffffffffffffffffffffff
|0xff..ff|address1|0x00|0x00|0x0f|
101 AND |address1|0x00|0x00|0x0f|
102 PUSH20 ffffffffffffffffffffffffffffffffffffffff
|0xff..ff|address1|0x00|0x00|0x0f|
123 AND |address1|0x00|0x00|0x0f|
124 DUP2 |0x00|address1|0x00|0x00|0x0f|
125 MSTORE |0x00|0x00|0x0f|
126 PUSH1 20 |0x20|0x00|0x00|0x0f|
128 ADD |0x20|0x00|0x0f|
129 SWAP1 |0x00|0x20|0x0f|
130 DUP2 |0x20|0x00|0x20|0x0f|
131 MSTORE |0x20|0x0f|
132 PUSH1 20 |0x20|0x20|0x0f|
134 ADD |0x40|0x0f|
135 PUSH1 00 |0x00|0x40|0x0f|
137 SHA3 |keccak256(address1)|0x0f|
138 DUP2 |0x0f|keccak256(address1)|0x0f|
139 SWAP1 |keccak256(address1)|0x0f|0x0f|
140 SSTORE |0x0f|

At byte 54–59 0f (15 in decimal), 0, 0 and bc5d…f2 are PUSHed to the stack.

  • 0f is the balance of the first address in the modify() function.
  • bc5d…f2 is the address which “belongs” the “balance = 15”.

Between byte 80 and 102, this code has no effect, it just assures that there are only 0000 after the address 0xbc5d.. in the stack by using the mask : 0x00000000000000000000000ffffffffffffffffffffffffffffffffffffffff

For example the address may be 0xd0000….bc5d..fd, the EVM AND that with the mask 0x000…000ffff and thus remove the d in beginning.

At byte 125 it MSTORE the result (the address “cleaned” in the memory at address 0x00.

At byte 128 it Add 20 to 0x00 and store 0 at byte 131 to this result. (0 is thus stored at 0x20 in memory)

At byte 137, the SHA3 instruction is called with 0 an 40 as parameters in the stack.

This instruction return the keccak256(memory[offset:offset+length]).
here offset = Stack(0) = 0, length = Stack(1) = 40 (because there are the last 2 values on the stack)

This is the SHA3 (or KECCAK256) of the address 0xbc5d…f2 concatenated to 0.

Later, 15 is stored in the result slot of the SHA3 operation by using SSTORE.

If we compare it with the formula at the beginning : SHA3(mapping_slot . key)

  • mapping_slot of the balance variable is of course equal to 0 (stored in memory from 0x20 to 0x40, not in the beginning because the EVM uses little-ending architecture)
  • key is equal to the address 0xbc5d291d2165f130375b94c62211f594db48fef2 (stored in memory from 0x00 to 0x20)

For those who where wondering what is the purpose of memory[0x00:0x40], this is the answer: the goal is (just) to store operands for hashing using keccak256

As the EVM hash everything between 0x00 and 0x40, our statement is true.

Here is the gas cost of such an operation:

65594 gas

The cost is LESS than storing in an array (at least in the beginning), because the mapping_slot (slot 0) contain nothing an is not modified, so there is 1 less SSTORE operation.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
TrustChain

TrustChain

1.4K Followers

Smart contract Auditor & Cybersecurity engineer, follow me on Twitter to get more value: https://twitter.com/TrustChain_DEFI