Reversing and debugging EVM Smart contracts : Full Smart Contract layout (Part 6)

Alain | Web3hackingLabs
12 min readJul 2, 2022

--

In this contract, we will reverse a full smart contract. The goal of this part will be to understand what are the different layout of a smart contract, to get a full understanding of the layout of a smart contract, and to de-compile it by hand.

🔴 This is the 6th part of our series about reversing and debugging EVM smart contracts, here you can find previous & next parts:

This is the smart contract which will be analyzed, compile it with the following settings :

  • solidity version: 0.8.7
  • optimizer: 200 runs

This smart contract is a bit longer than previous one, but don’t worry the difficulty is barely higher.

Here is the full disassembly of the smart contract : https://ethervm.io/decompile/ropsten/0xd3ac4c6028484a0f101f835e9e5dab72a2fe1b97

(Don’t trust the de-compilation. There is some mistakes which will be highlighted throughout this post, only use the disassembly at the bottom of the page)

1. Disassembling the function main

In every program (not only on the EVM) there is what is called an entry point, this is the first line of code which is executed.

For example, when you create a program in C or in C++ the entry point is the function main()

But in solidity this is different, the entry point is the beginning of the smart contract.
Every time you call a smart contract on the blockchain, this entry point is executed first. We will call it the function main and it’s situated of course at the byte 0.

By looking at the disassembly between instruction 0 and 17, we can easy deduce that the function main begin by this code:

(We already analyzed the beginning of a smart contract is the first part of this series, if you don’t remember feel free to refresh your knowledge if something is missing)

by look between byte 18 and 36 we can easily see an “else if” statement which is the function selector:

It’s worth noting that “switch” statement doesn’t exists in solidity, instead it’s composed of else if .

We will reverse what’s in the else if function later in this post. We need to first “map” all the function of the smart contract and see where are they situated in the smart contract.

2. The function layout

A smart contract, is ONLY constituted of different function. Every piece of assembly lies in a function, moreover each function a situated side by side in the smart contract code.

That means that if there is some code of the function A situated between byte 1 and 5 and some code of the function B situated between byte 6 and 9, the function A can’t continue after the byte 10 because the code are not “side by side” they are separated by B.

Given this information, we will try to reconstitute all the functions of the smart contract, note that there is “user-created” functions but also “compiler” created functions.

Please note also that we will use HEXADECIMAL offsets instead of decimals ones in this post.

To perform this, we will ONLY look at 3 instructions now : JUMP, JUMPDEST, JUMPI (and sometimes to PUSH)

We know that between byte 0 and byte 66 (at least), we’re in the function main() , now we don’t know where the function main() ends.

But at byte 70…

67 PUSH1 0x64 
69 PUSH1 0x71
6B CALLDATASIZE
6C PUSH1 0x04
6E PUSH1 0xba
70 JUMP
71 JUMPDEST

We can see easly a function call to 0xBA, with arguments 0x04 and CALLDATASIZE (this is function which PUSH the size of the msg.data in to the stack) which were pushed in the stack.

0x71 is the saved address of continuation of the execution flow once the call is done (this is why there is a JUMPDEST at byte 0x71

So its seems that 0xba is address where starts a new function, let’s go to 0xba

00BA JUMPDEST  
00BB PUSH1 0x00
00BD PUSH1 0x20
00BF DUP3
00C0 DUP5
00C1 SUB
00C2 SLT
00C3 ISZERO
00C4 PUSH1 0xcb
00C6 *JUMPI

There some condition a 0xC6, if the condition is met the EVM jumps to CB otherwise the code continues and reverts shortly after that.

00C7 PUSH1 0x00  
00C9 DUP1
00CA *REVERT
00CB JUMPDEST
00CC POP
00CD CALLDATALOAD
00CE SWAP2
00CF SWAP1
00D0 POP
00D1 *JUMP

At 0xD1 The function JUMP to an unknown destination, it’s likely the 0x71 saved address before the call. (it’s possible to verify that by counting the number of elements in the stack at each instruction between 0xba and 0xd1) So this is the end of the function which start a 0xba.

So it seems that the first function, we discovered are situated between BA and D1, moreover we know that it takes 2 arguments.

We will name it function_0BA(a,b) as we don’t know the “True” name of the function.

Let’s dive deeper, as a smart contract is only composed of functions. There is another function which starts at 0xD2 (after 0xD1), let’s disassemble that:

D2 JUMPDEST  | D3 PUSH1 0x00 | D5  DUP3 | D6 NOT | D7 DUP3 | 00D8 GT | D9 ISZERO | DA PUSH1 0xf2 | DC *JUMPI

At 0xDC, the function there is too a condition, the EVM Jumps to 0xF2 if the condition is met and otherwise execute the code between 0xDD and 0xF1 which reverts

00DD    63  PUSH4 0x4e487b71  
00E2 60 PUSH1 0xe0
00E4 1B SHL
00E5 60 PUSH1 0x00
00E7 52 MSTORE
00E8 60 PUSH1 0x11
00EA 60 PUSH1 0x04
00EC 52 MSTORE
00ED 60 PUSH1 0x24
00EF 60 PUSH1 0x00
00F1 FD *REVERT
00F2 5B JUMPDEST
00F3 50 POP
00F4 01 ADD
00F5 90 SWAP1
00F6 56 *JUMP

At 0xF2, the function JUMP to an unknown location. This likely the saved return address. Moreover there is not any code after that (only the hash metadata, which we talk about in the part 4) so this is surely the end of the function.

We will name it func_0D2(), but we don’t know (at least now) how much argument it takes. We just know that it lies from 0xD2 to 0xF6 in the code.

We’re not done, there is space between 0x71 and 0xBA to analyze.

If we go back to 0x71 and continue after the call of the function 0xBA.

0071    5B  JUMPDEST  
0072 60 PUSH1 0x0a
0074 60 PUSH1 0x7b
0076 82 DUP3
0077 82 DUP3
0078 60 PUSH1 0x82
007A 56 *JUMP
007B 5B JUMPDEST

There is another call at 0x7A with 2 arguments (because DUP3 push 1 value to the Stack) to the function 0x82. So let’s inspect the function 0x82.

0082    5B  JUMPDEST  
0083 60 PUSH1 0x00
0085 60 PUSH1 0x8c
0087 82 DUP3
0088 84 DUP5
0089 60 PUSH1 0xd2
008B 56 *JUMP
008C 5B JUMPDEST

..
0092 56 *JUMP

There is a call to 0xd2 at 0x8B with 2 arguments (DUP3 and DUP5 push 1 value the stack). After that the function JUMP to an unknown location at byte 0x92.

This is the third function of the smart contract, we will call it func_082(a,b)
It’s situated between byte 0x82 and 0x92.

As said before 92 is thus the start of the new function (which we’ll call func_092() )

...
009D 60 PUSH1 0xa4
009F 57 *JUMPI
00A0 60 PUSH1 0x00
00A2 80 DUP1
00A3 FD *REVERT
00A4 5B JUMPDEST
00A5 81 DUP2
...

Between 0x9F and 0xA4 there is a condition, using the same layout as before.
But if we continue there is something strange at 0xB5, there is a 2nd condition.

00B3    60  PUSH1 0x8c  
00B5 57 *JUMPI

00B6 60 PUSH1 0x00
00B8 80 DUP1
00B9 FD *REVERT

If the condition is met, the EVM JUMP to 0x8C. But 0x8c is not a function, it’s in the function function_082(a,b) (between 0x82 and 0x92).

Is there an error in our hypothesis? Can the functions be interleaved?

Fortunately, the answer is NO, This is the “fault” of the he optimizer.
It saw that there might be 2 identical blocks of assembly at 0xBA (after 0xB9) and 0x8C so the optimizer preferred to JUMP to 0x8C instead of coping this block to 0xBA, it’s less costly during deployment. This thus the end of the 0x92 function.

We know identified the 5 function of the smart contract :

  1. func_082() 0x82 => 0x91
  2. func_092() 0x92 => 0xB9
  3. func_0BA(a,b) 0xBA => 0xD1
  4. func_0D2() 0xD2 => 0xF6

We can also deduce the layout of the function main() 0x00 => 0x81. Because from 0x7B (still in the function main()) and 0x81 the code continues and JUMP at 0x81.

3. Understanding the code functions

Once, we know where are situated all the functions, let’s try to figure out the code of these functions.

3.1 The easiest function: the function func_0BA(a,b):

We know that it takes 2 arguments on the stack. |a|b|RET|
(RET is the saved byte address after the call of the function, we will not show it)

I won’t show here the assembly (you can find it in: https://ethervm.io/decompile/ropsten/0x13e566acef92c2ff26688c08cd25ab13f045b195)

  1. It pushes 0x00 and 0x20 |0x20|0x00|a|b| and DUP3, DUP5 |b|a|0x20|0x00|a|b|
  2. it SUB the result |b-a|0x20|0x00|a|b| .
  3. and compare Stack(0) to 0x20, by using SLT, if it’s less than 0x20 1 is PUSHed to the stack and 0 if it’s greater or equal to 20. |b-a < 0x20|0x00|a|b| .
  4. ISZERO opcode is used, if the result is zero (so if it’s greater or equal) the function JUMP to 0xcB otherwise it don’t JUMP and revert shortly after. The stack is then |!(b-a < 0x20)|0x00|a|b|
    We will suppose that YES it’s equal/greater so the stack is |1|0x00|a|b| and |0x00|a|b| after the JUMP at 0xCB.
  5. The EVM POP at 0xCC Stack(0) so the stack is |a|b| .
  6. CALLDATALOAD at 0xCD load the 32 bytes after Stack(0) in msg.data |msg.data[a:a+0x20]|b|RET| .
  7. SWAP2 and SWAP1 at 0xCE and 0xCF : |b|msg.data[a:a+0x20]|RET| .
  8. And return data at 0xD1 after the POP |RET|msg.data[a:a+0x20]| .

This code is not very complicated, and it’s present in a lot of smart contracts. Once you see the pattern, you can identify it everywhere. You just need practice don’t worry !

To summarize: The function BA, takes 2 arguments, see the difference and compare it to 0x20
It reverts if the difference is inferior to 0x20 in hex. (32 in decimal) we will see why later.

We can thus “reassemble” the function func_0BA(a,b)

function func_0BA(a,b) {
if (a - b < 20) { reverts(); } else return msg.data[b:b+0x20]
}

3.2 func_082(a,b)

This is by far the shortest function in this code, it takes too 2 arguments :

  1. The stack at 0x82 is |a|b|RET|
  2. After the DUP5 it’s : |a|b|0x8c|0x00|a|b|RET|
  3. After func_D2 is called (it return 1 value, which we will name x) the stack is |x|0x00|a|b|RET|
  4. After the SWAP4 and SWAP3 the stack is |a|0x00|b|RET|x| and finally the 3 pops|RET|x|

x which is the return value of func_D2 is returned by func_82.

The purpose of the function 82 is just to call the function D2 with arguments of the function 82 and nothing more, here is the dissasembly:

function func_082(a,b) {
return func_D2(a,b) {
}

3.3 func_D2(a,b)

As said before the func_D2 is called with 2 arguments. |a|b|RET|

  1. 0x00 is pushed and DUP3 is called |b|0|a|b|RET|
  2. NOT is called on 0xb |0xfffffffff....ffff5|0|a|b|RET|
  3. DUP3 is called |a|0xfffffffff....ffff5|0|a|b|RET|
  4. GT is called |a > 0xfffffffff....ffff5|0|a|b|RET|
  5. If stack(0) is true (equal to 1) then ISZERO return 0 and so the JUMPI won’t be performed, the function continues it’s flow and reverts
  6. We will suppose a is less than 0xffff….ff5, so the function JUMP to F2, at this point the stack is |0|a|b|RET|
  7. POP and ADD are performed |a+b|RET| and finally: SWAP1 |RET|a+b|
    The function return a+b, but what is the purpose of the code between steps 1 and 5 ?

First here is the “decompilation”:

function func_D2(a,b) {
if ( ~(a) > b) { revert } else return a+b
}

The function NOT the first arguments and compares it to b. What does it means?

This is here to prevents overflow.

We know that NOT(a) = 2²⁵⁶— a
for example NOT(0x1) = 0xfffffffffffffffffff…ffff (64 f because the EVM work by slot of 32 bytes)

If a is bigger than NOT(b) then the sum (a+b) is greater than 0xfffffff…ffffff and thus can’t be contained in a uint256 bytes so there is an overflow.

So the goal of func_D2 is add the 2 arguments and to verify if there is an overflow or not.

3.4 The function func_093

At first between 0x93 0xA4, it’s the same as between 0xBA and 0xCB in the first function the stack is thus the same |0x00|a|b| (func_093 takes too 2 arguments)

  1. Byte 0xA5 and 0xA6: DUP2 and CALLDATALOAD is called |msg.data[a:a+20]|0x00|a|b|
  2. Byte 0xA7 to 0xAB :After the 3 PUSHes |0xa0|0x01|0x01|msg.data[a:a+20]|0x00|a|b|
  3. Byte 0xAD: The SHL shift to the left all bytes of 0x01 by 0xa0 (160 in dec) binary number, and so by 40 hex numbers. The result is |0x0000...00100......00|0x01|msg.data[a:a+0x20]|0x00|a|b|
  4. Byte 0xAE: THe EVM sub call SUB opcode, the result is
    |0x0000...000ffffff..ffffff|msg.data[a:a+0x20]|0x00|a|b|
  5. Byte 0xAF to 0xB1: After DUP2, AND and again DUP2 opcode the result is:
    |msg.data[a:a+0x20]|msg.data[a:a+0x20]|msg.data[a:a+0x20]|0x00|a|b|
  6. Byte 0xB2: EQ is called, as Stack(0) and Stack(1) are equal the result is : |1|msg.data[a:a+20]|0x00|a|b|
  7. After that the EVM jumps to 0x8c and END the function.

The purpose of this code is the same as for func_0BA, but with difference.
The code between steps 1 and 7 verify that msg.data[a:a+0x20] is a valid ethereum address. OF this format :

0x000000000000000000000000abcdef….124 and not something like that :

0x100000000000000000000000abcdef….124.
Here is the decompilation of the function :

func_093(a,b) {if (a - b < 20) { revert(); } else { 
if (msg.data[b:b+0x20] & 0x0000...000ffffff..ffffff == msg.data[b:b+0x20]) {
return msg.data[b:b+0x20]
} else { reverts(); }
}

We are done for the 4 functions! Now only the main() function remain…

4. What is inside the main() ?

We don’t analysed offsets between 0x37–0x82. So Let’s GOOOOO!!!

We didn’t analysed what was is the 2 “else if”, which repersents the function selector, however this is the most important to see what the 2 external functions does.

4.1 We will start by the the first “else if” at 0x37 (from signature 0xfb1669ca)

0037    5B  JUMPDEST  
0038 60 PUSH1 0x64
003A 60 PUSH1 0x42
003C 36 CALLDATASIZE
003D 60 PUSH1 0x04
003F 60 PUSH1 0x93
0041 56 *JUMP

At first it calls func_093 with 0x04 and CALLDATASIZE.
CALLDATASIZE is the size of msg.data.

This function calculate the difference between 2 numbers and reverts if the result is inferior to 32.

As 1 argument is encoded as 0x20 of size (32 in decimal), the purpose of this is to verify that there is at least 1 argument when the function was called. After that func_092 return msg.data[0x04:0x24] which is the first argument of blockchain function call.

Between 0x42 and 0x63, the smart contract SSTORE the result in slot 0. (We won’t go in details here as the article is too long…)

So this is the setOwner(_addr) “function”.

At 0x63 it jumps to an unknow destination (in fact it’s 0x64 which is the end of the else if) At 0x65 the smart contract STOP.

In the second else if at 0x66 (from signature: 0xfb1669ca)

0066    5B  JUMPDEST  
0067 60 PUSH1 0x64
0069 60 PUSH1 0x71
006B 36 CALLDATASIZE
006C 60 PUSH1 0x04
006E 60 PUSH1 0xba
0070 56 *JUMP

It calls the function func_0BA with 04 and CALLDATASIZE to verify if there is at least 1 argument in msg.data if yes it returns msg.data[0x04:0x24] this is the first argument of the blockchain call.

[0x00:0x03] is still the 4byte signature.

After that is call 0x82 which call DA:

0071    5B  JUMPDEST  
0072 60 PUSH1 0x0a
0074 60 PUSH1 0x7b
0076 82 DUP3
0077 82 DUP3
0078 60 PUSH1 0x82
007A 56 *JUMP

The arguments are msg.data[0x04:0x24] (the first DUP3) and 10 (the second DUP3).
The result is msg.data[0x04:0x24] + 40.

007B    5B    JUMPDEST  
007C 60 PUSH1 0x01
007E 55 SSTORE
007F 50 POP
0080 50 POP
0081 56 *JUMP

After the end of the function it SSTORE the result at slot 0x01. At 0x81 it jumps to an unknow address which is 0x64 to. At 0x65 the smart contract STOP.

The second else if is obviously the setBalance(uint x).

We can also deduce that func_0DA(a,b) is the internal returnAdd() function.

5. Conclusion

We succeeded in reverse engineering this smart contract, i hope you learned a lot in this post which was a lot more practical than the 5 others!

We can also notice that the smart contract are not really optimized, there is some blocks of code which are repeated like between 0x93 0xA4 and between 0xBA 0xCB.

It’s possible to remove some byte code from the smart contract to compress it a bit, if you have the time, you can give a try.

🔴 This is the 6th part of our series about reversing and debugging EVM smart contracts, here you can find previous & next parts:

--

--

Alain | Web3hackingLabs

Smart contract Auditor & Cybersecurity engineer, follow me on Twitter to get more value: https://rebrand.ly/twitter_medium