Reversing and debugging THE EVM Part 7

Alain | Web3hackingLabs
10 min readJul 17, 2022

--

In this last part of the first level of reverse engineering, we will talk about interacting with others smart contract. How the EVM can handle that in assembly ?

Let’s find out !

🔴 This is the 7th part of our series about reversing and debugging EVM smart contracts, here you can find previous & next parts:

1. CALL Introduction

Here is our (last) test smart contract:

1 . You need to compile & deploy the 2 contracts Caller and Called. (With the optimizer at 200 and solidity 0.8.7)

2. After that, call the function Caller.setAddr(x), with x = the address of the called contract to set the address to the contract called.

3. Now we can analyze the function “test()” by calling it and disassembling.

You can find the function by following this link: https://ethervm.io/decompile/ropsten/0x7dCb3545d535D2DF5752F389c34E346f333d907b

2. FULL decompilation

We will start at 0x6D after the jump from the function selector.

006D    5B  JUMPDEST     |0xf8a8fd6d| (discarded)
006E 61 PUSH2 0x006b |0x006b|
0071 60 PUSH1 0x00 |0x00|0x006b|
0073 80 DUP1 |0x00|0x00|0x006b|
0074 54 SLOAD |addr|0x00|0x006b|

At 0x6E, PUSH2 0x006b is a pointer to 0x6B where the function STOP, at the end of the else if in the function selector the EVM will jump to this byte.

After that the EVM SLOAD the address at slot 0x00, which is of course the _addr variable in the smart contract. the stack is after byte 74: |addr|0x00|RET| (RET is 0x6b, the pointer to 6B)
We can translate it in to assembly code:

addr := sload(0x00)

Here is the disassembly of the next part:

0075    60  PUSH1 0x40  |0x40|addr|0x00|0x006b|
0077 80 DUP1 |0x40|0x40|addr|0x00|0x006b|
0078 51 MLOAD |0x80|0x40|addr|0x00|0x006b|
0079 60 PUSH1 0x04 |0x04|0x80|0x40|addr|0x00|0x006b|
007B 81 DUP2 |0x80|0x04|0x80|0x40|addr|0x00|0x006b|
007C 52 MSTORE |0x80|0x40|addr|0x00|0x006b|
007D 60 PUSH1 0x24 |0x24|0x80|0x40|addr|0x00|0x006b|
007F 81 DUP2 |0x80|0x24|0x80|0x40|addr|0x00|0x006b|
0080 01 ADD |0xa4|0x80|0x40|addr|0x00|0x006b|
0081 82 DUP3 |0x40|0xa4|0x80|0x40|addr|0x00|0x006b|
0082 52 MSTORE |0x80|0x40|addr|0x00|0x006b|
0083 60 PUSH1 0x20 |0x20|0x80|0x40|addr|0x00|0x006b|
0085 81 DUP2 |0x80|0x20|0x80|0x40|addr|0x00|0x006b|
0086 01 ADD |0xa0|0x80|0x40|addr|0x00|0x006b|
0087 80 DUP1 |0xa0|0xa0|0x80|0x40|addr|0x00|0x006b|
0088 51 MLOAD |0x00|0xa0|0x80|0x40|addr|0x00|0x006b|

The EVM continue with a series of MLOAD / MSTORE

  1. At byte 78 it MLOAD the memory at 0x40, the result is 0x80 (the free memory pointer) |0x80|0x40|addr|0x00|RET|
  2. At byte 7C, the EVM MSTORE 0x04 at the free memory pointer (0x80 here), the stack remain the same |0x80|0x40|addr|0x00|RET|
  3. At byte 7D-82, the EVM add 24 to 0x80 and MSTORE the result 0xa4 in 40, this is the new free memory pointer, the stack remain exactly the same. This is normal as 0x80 is this time not free.
  4. At byte 88, the EVM add 0x20 to 0x80 MLOAD the result = 0xa0 this time. Of course the memory is empty, the stack is:|0x00|0xa0|0x80|0x40|addr|0x00|RET|

Here is the decompilation:

uint free_pointer := mload(0x40) // equalts 0x80
mstore(free_pointer,0x04)
free_pointer2 := free_pointer + 0x24
mstore(0x40,free_pointer2)
free_pointer3 := free_pointer + 0x20
result := mload(free_pointer3) // equals 0

Thus the memory is now:

memory after byte 0x88
0089  60  PUSH1 0x01  |0x01|0x00|0xa0|0x80|0x40|addr|0x00|.
008B 60 PUSH1 0x01 |0x01|0x01|0x00|0xa0|0x80|0x40|addr|0x00|.
008D 60 PUSH1 0xe0 |0xe0|0x01|0x01|0x00|0xa0|.. (hidden)
008F 1B SHL |0x00..00100..000|0x01|0x00|0xa0|..
0090 03 SUB |0x00..000ff..fff|0x00|0xa0|...
0091 16 AND |0x00|0xa0|...
0092 63 PUSH4 0x03155a67 |0x03155a67|0x00|0xa0|...
0097 60 PUSH1 0xe2 |0xe2|0x03155a67|0x00|0xa0|...
0099 1B SHL |0x03155a6700000..00|0x00|0xa0|...
009A 17 OR |0x03155a6700000..00|0xa0|...
009B 90 SWAP1 |0xa0|0x03155a6700000..00|...
009C 52 MSTORE |0x80|0x40|addr|0x00|RET|
009D 90 SWAP1 |0x40|0x80|addr|0x00|RET|
009E 51 MLOAD |0xa4|0x80|addr|0x00|RET|

As a lot of assembly looks quite familiar here, I won’t describable all instructions.

  1. Between 89 and 91, the EVM does noting (code can be optimized here)
  2. Between 92 and 9B, the EVM “create” 0x03155a6700000..00 in the stack
  3. At byte 9C: The EVM stores this ( 0x03155a6700000..00) in 0xa0
  4. At byte 9E: the EVM MLOAD the free memory pointer.

Note the 0x03155a67 is the signature of the test() function!

Decompilation:

mstore(0xa0,0x03155a6700000..00)
free_pointer4 = mload(0x40) // load a4

Between 92 and 9E

009F    60  PUSH1 0x01  |0x01|0xa4|0x80|addr|0x00|RET|
00A1 60 PUSH1 0x01 |0x01|0x01|0xa4|0x80|addr|0x00|RET|
00A3 60 PUSH1 0xa0 |0xa0|0x01|0x01|0xa4|0x80|addr|0x00|RET|
00A5 1B SHL |0x0..0100..00|0x01|0xa4|0x80|addr|0x00|RET|
00A6 03 SUB |0x0..0ff..ff|0xa4|0x80|addr|0x00|RET|
00A7 90 SWAP1 |0xa4|0x0..0ff..ff|0x80|addr|0x00|RET|
00A8 92 SWAP3 |addr|0x0..0ff..ff|0x80|0xa4|0x00|RET|
00A9 16 AND |addr (cleaned) |0x80|0xa4|0x00|RET|
00AA 92 SWAP3 |0x00|0x80|0xa4|addr|RET|
00AB 62 PUSH3 0x0f4240 |0x0f4240|0x00|0x80|0xa4|addr|RET|
00AF 92 SWAP3 |0xa4|0x00|0x80|0x0f4240|addr|RET|
00B0 90 SWAP1 |0x00|0xa4|0x80|0x0f4240|addr|RET|
00B1 91 SWAP2 |0x80|0xa4|0x00|0x0f4240|addr|RET|
00B2 61 PUSH2 0x00ba |0x00ba|0x80|0xa4|0x00|0x0f4240|addr|RET|
00B5 91 SWAP2 |0xa4|0x80|0x00ba|0x00|0x0f4240|addr|RET|
00B6 61 PUSH2 0x012d |0x012|0xa4|0x80|0x00ba|0x00|0x0f4240|
00B9 56 *JUMP |0xa4|0x80|0x00ba|0x00|0x0f4240|addr|RET|

Between 9F and A9, we also already know this pattern, the goal is to “clean” the address with 0x000..00ffff as a mask.

For example here is a “cleaned” address (in 32 bytes): 0x000000000000000000000000aaC5322e456d45E7b6c452038836C5631C2AeBc0

And here is the same address not cleaned: 0x10000000000b000000000000aaC5322e456d45E7b6c452038836C5631C2AeBc0

The goal is to remove the “1” and “b”.

Between AA and B1, there is nothing interesting.

This block ends with a function call to the function situated at 0x012d with 0xa4 and 0x80 as arguments.

Apart from the “address cleaning” and the signature in memory, there aren’t many thing to say, at least for now…

And what’s the 4 at memory[0x80:0xa0]??? Don’t worry, I’ll explain this later :)

Decompilation:

addr = addr & 0x000..00fff
func_012d(0xa4,0x80) // return values aren't known now.

Memory dump:

3. What’s in the functions at 0x12D?

Let’s follow the execution flow:

012D    5B  JUMPDEST     |0xa4|0x80|RET|....
012E 60 PUSH1 0x00 |0x00|0xa4|0x80|RET|....
0130 82 DUP3 |0x80|0x00|0xa4|0x80|RET|....
0131 51 MLOAD |0x04|0x00|0xa4|0x80|RET|....
0132 60 PUSH1 0x00 |0x00|0x04|0x00|0xa4|0x80|RET|....
0134 5B JUMPDEST |0x00|0x04|0x00|0xa4|0x80|RET|....
0135 81 DUP2 |0x04|0x00|0x04|0x00|0xa4|0x80|RET|....
0136 81 DUP2 |0x00|0x04|0x00|0x04|0x00|0xa4|0x80|RET|...
0137 10 LT |0x01|0x00|0x04|0x00|0xa4|0x80|RET|....
0138 15 ISZERO |0x00|0x00|0x04|0x00|0xa4|0x80|RET|....
0139 61 PUSH2 0x014e
013C 57 *JUMPI

We can disassemble this piece of assembly as follows:

var1 := mload(arg2)  // loads 0x04, arg2 = 080
if (!(var1 < 0x00)) {
JUMP 0x014e
}

After analysis, this code JUMP to 0x14e if 0x00 is NOT less than memory[0x80:0xa0]. If not the following code is executed.
As 0x00 is LESS than 0x00, so the code DON’T JUMP.

013D    60  PUSH1 0x20 |0x20|0x00|0x04|0x00|0xa4|0x80|RET|....
013F 81 DUP2 |0x00|0x20|0x00|0x04|0x00|0xa4|0x80|RET|....
0140 86 DUP7 |0x80|0x00|0x20|0x00|0x04|...
0141 01 ADD |0x80|0x20|0x00|0x04|...
0142 81 DUP2 |0x20|0x80|0x20|0x00|0x04|...
0143 01 ADD |0xa0|0x20|0x00|0x04|...
0144 51 MLOAD |0x0c55699c|0x20|0x00|0x04|...
0145 85 DUP6 |0xa4|0x0c55699c|0x20|0x00|0x04|...
0146 83 DUP4 |0x00|0xa4|0x0c55699c|0x20|0x00|0x04|...
0147 01 ADD |0xa4|0x0c55699c|0x20|0x00|0x04|...
0148 52 MSTORE |0x20|0x00|0x04|...
0149 01 ADD |0x20|0x04|...
014A 61 PUSH2 0x0134 |0x134|0x20|0x04|...
014D 56 *JUMP |0x20|0x04|...

Decompilation:

signature := mload(0xa0)
mstore(0xa4, signature)
jump 0x134

This code loads the function x() in memory[0xa0:0xc0] signature and MSTORE it in memory[0xa4:0xc4]

This code JUMP to 0x134, but instead of 0x80 and 0x04 as arguments at the begining of the function_12d(uint,uint), they are 0x20 and 0x04.

As a result next time memory[0x20:0x40] will be loaded, but this chunk is empty (=0x00), so memory[0x20:0x40]= 0x0 is NOT less than 0x0, thus the code will JUMP at 0x14e.

014E    5B  JUMPDEST  |0x20|0x04|0x00|0xa4|0x80|0xba|0x00|..
014F 81 DUP2 |0x04|0x20|0x04|0x00|0xa4|0x80|0xba|0x00|..
0150 81 DUP2 |0x20|0x04|0x20|0x04|0x00|0xa4|0x80|0xba|0x00|
0151 11 GT |0x01|0x20|0x04|0x00|0xa4|0x80|0xba|0x00|
0152 15 ISZERO |0x00|0x20|0x04|0x00|0xa4|0x80|0xba|0x00|
0153 61 PUSH2 0x015d
0156 57 *JUMPI
0157 60 PUSH1 0x00 |0x00|0x20|0x04|0x00|0xa4|0x80|0xba|0x00|
0159 82 DUP3 |0x04|0x00|0x20|0x04|0x00|0xa4|0x80|0xba|0x00|
015A 85 DUP6 |0xa4|0x04|0x00|0x20|0x04|0x00|0xa4|0x80|...|
015B 01 ADD |0xa8|0x00|0x20|0x04|0x00|0xa4|0x80|...|
015C 52 MSTORE |0x20|0x04|0x00|0xa4|0x80|...|
015D 5B JUMPDEST |0x20|0x04|0x00|0xa4|0x80|...|
015E 50 POP |0x04|0x00|0xa4|0x80|...|
015F 91 SWAP2 |0xa4|0x00|0x04|0x80|...|
0160 90 SWAP1 |0x00|0xa4|0x04|0x80|...|
0161 91 SWAP2 |0x04|0xa4|0x00|0x80|...|
0162 01 ADD |0xa8|0x00|0x80|0xba (RET)|
0163 92 SWAP3 |0xba|0x00|0x80|0xa8|
0164 91 SWAP2 |0x80|0x00|0xba|0xa8|
0165 50 POP
0166 50 POP
0167 56 *JUMP |0xa8|

Decompilation:

if ((mload(0x80) > 20)) {
mstore(var1+arg1,0x00) //arg1= a4 var1 = mload(0x80) = 4
}

At first: This code verifies if 0x20 is greater than 0x04.
If yes 0x00 is stored at the memory[first_argument+var1] (var is equal to 4).
Otherwise, the execution flow continues and the function ends.

Memory is almost the same as before, there is only one difference:
the function signature is present 2 times at memory[0xa0:0xa8], why this is the case ? What is the purpose of this function ?

Be patient, I’ll answer to all your questions :)

This function can be fully dissasembled:

var1 := mload(arg2)  // loads 0x04, arg2 = 080
if (!(var1 < 0x00)) {
signature := mload(0xa0)
mstore(0xa4, signature)
jump 0x134
}
if ((mload(0x80) > 20)) {
mstore(var1+arg1,0x00) //arg1= a4 var1 = mload(0x80) = 4
}

4. CALL INSTRUCTION

The end of the function is at 0x167, The EVM jump to the RET address (0xba)

00BA    5B  JUMPDEST    |0xa8|0x00|0xf4240|address|0x06b|..
00BB 60 PUSH1 0x00 |0x00|0xa8|0x00|0xf4240|address|0x06b|..
00BD 60 PUSH1 0x40
|0x40|0x00|0xa8|0x00|0xf4240|address|0x06b|..
00BF 51 MLOAD |0xa4|0x00|0xa8|0x00|0xf4240|address|0x06b|..
00C0 80 DUP1 |0xa4|0xa4|0x00|0xa8|0x00|0xf4240|addr|..
00C1 83 DUP4 |0xa8|0xa4|0xa4|0x00|0xa8|0x00|0xf4240|addr|..
00C2 03 SUB |0x04|0xa4|0x00|0xa8|0x00|0xf4240|addr|..
00C3 81 DUP2 |0xa4|0x04|0xa4|0x00|0xa8|0x00|0xf4240|addr|..
00C4 85 DUP6 |0x00|0xa4|0x04|0xa4|0x00|0xa8|0x00|0xf4240|..
00C5 88 DUP9 |addr|0x00|0xa4|0x04|0xa4|0x00|0xa8|0x00|...
00C6 88 DUP9 |0xf4240|addr|0x00|0xa4|0x04|0xa4|0x00|0xa8|..
00C7 F1 CALL |0x01|0xa8|..

As the CALL instruction takes 7 arguments in the Stack, the EVM push a lot of values in the stack as we can see above. Here is the meaning of every argument in the stack:

  • Stack(0)= Max gas. (=0xf4240 = 100.000 in dec)
  • Stack(1) = address of the smart contract
  • Stack(2) = msg.value (the number of ether sent = 0)
  • Stack(3) = The offset of msg.data in memory (=0xa4)
  • Stack(4) = The length of msg.data (=0x04)
  • Stack(5) = Tho offset of the return value in memory (=0xa4)
  • Stack(6) = the length of the return value (=0x04)

Here is the content of memory Before the call:

We see that memory[0xa4:0xa4+0x04] = memory[0xa4:0xa8] = 0x0c55699c. This is the signature of the function test()

As there aren’t any arguments, msg.data contains only 4 bytes.

After the call was done:

  • 0x00 is present in Stack(0) if the CALL was a failure.
  • 0x01 is present in Stack(0) if the CALL was a success.

Return values are stored in memory according to the arguments supplied in the CALL.

5. Step over vs Step In

Don’t forget the difference between step over and step in!

In RED, this means step in. (step to the next instruction)

In GREEN, this means step over. (step to the next instruction AND skips the function if there is a call to a function, whether this is an internal or external function.)

When you are at the byte 0xC7, click exceptionally in the green button, to skip the call (you can also test what happens when you click in the red button.)

6. RETURNDATASIZE

00C8    93  SWAP4          
00C9 50 POP
00CA 50 POP
00CB 50 POP
00CC 50 POP |0x01|0x6b|0xf8a8fd6d|
00CD 3D RETURNDATASIZE |0x00|0x01|0x6b|0xf8a8fd6d|
00CE 80 DUP1 |0x00|0x00|0x01|0x6b|0xf8a8fd6d|
00CF 60 PUSH1 0x00 |0x00|0x00|0x00|0x01|0x6b|0xf8a8fd6d|
00D1 81 DUP2 |0x00|0x00|0x00|0x00|0x01|0x6b|0xf8a8fd.
00D2 14 EQ |0x01|0x00|0x00|0x01|0x6b|0xf8a8fd.
00D3 61 PUSH2 0x00f8
00D6 57 *JUMPI |0x00|0x00|0x01|0x6b|0xf8a8fd.

RETURNDATASIZE, we’ve never seen this opcode. It just returns in the stack the size of the data returned in the last call.

As there isn’t data returned in the last call, the size is zero, therefore this instructions returns 0.

Next in the code, the EVM compare this value to 0 and jump to 0xf8 if RETURNDATASIZE = 0.

00F8    5B  JUMPDEST  |0x00|0x00|0x01|0x6b|0xf8a8fd.
00F9 50 POP
00FA 50 POP
00FB 50 POP |0x6b|0xf8a8fd.
00FC 56 *JUMP |0xf8a8fd
006B 5B JUMPDEST
006C 00 *STOP

After 0xf8 location, the smart contract don’t contain much code. The code just stops…

7. To summarize

Okay… It was very long, how can we summarize all this assembly code ?

There is 3 main parts:

In the part 2 of this article between byte 0xCD and B9, the EVM stores the necessary values on memory. (length of msg.data and of the return value)

In the part 3 of this article (the function func_012d) seems to do some verification on memory…

In the part 4/5/6 of this article, the EVM push all the arguments in the Stack, call the function and check the size of the returned data.

8. Others CALL-derived opcodes

  • STATICCALL This opcode does exactly the same as CALL, the differences is the msg.value which will be always 0, moreover the STATICALL can’t modify the state of the called contract. I won’t Reverse a contract with STATICCALL here, because it will take too much time to write and the result is almost the same than CALL.
  • DELEGATECALL, is the same as CALL too but the difference is that all states changes will be in the caller contract.
    (for example if slot 0x02 is set to 0x10 in a DELEGATECALL, 0x02 will equal 10 in the caller contract and not in the called contract)
    msg.value and msg.sender are the same than without calling smart contract. (if addr calls smart contract A which DELEGATECALL to smart contract B, msg.sender will be still addr and msg.value will remain the same)
  • CALLCODE, is very similar to DELEGRATECALL but msg.sender and msg.value is changed to the smart contract’s one. So in last example (msg.sender will be contract A, and msg.value will be chosen by contract A)

9. Conclusion

In these 7 first episodes, we learned almost every instruction of the EVM assembly and more importantly: the reversing smart contract methodology.

I hope you enjoyed these series and learnt a lot about the EVM!

🔴 This was the 7th part of our series about reversing and debugging EVM smart contracts, here you can find previous & next parts:

--

--

Alain | Web3hackingLabs
Alain | Web3hackingLabs

Written by Alain | Web3hackingLabs

Smart contract Auditor & Cybersecurity engineer, follow me on Twitter to get more value: https://rebrand.ly/twitter_medium