Reversing and debugging EVM Smart contracts: Deployment of a smart contract (Part 2)

Alain | Web3hackingLabs
13 min readJun 22, 2022

--

This is the second part of our series of articles about reversing and debugging EVM smart contracts,

In this second part we will analyze what is happening when you deploy a smart contract in the blockchain, (by clicking in the “deploy” button in remix for example)

Here is the example contract we will deploy:

pragma solidity ^0.8.0;  
contract Test {
uint balance;
constructor() {
balance = 9;
}
}

Before diving in this tutorial don’t forget to:

  • Turn on your computer and start remix.
  • Activate the optimizer, at 1 runs only, it helps the compiler to produce more efficient code. (Click in “Advanced Configurations” and “enable optimization”)
  • Compile the code above using solidity version 0.8.7. (other versions might yield a slightly different code)
  • Deploy the smart contract in the JavaScript EVM (Using the latest version: London)

1. Then we start !

As a quick reminder, to debug a transaction, you must press the “debug” button bellow after deploying the smart contract:

All the debugging information is situated on the left of the screen where you can see the stack, local variables, state, memory, storage, disassembly and so on…

But before starting to debug, can you answer this question:

Q: After the smart contract deployment, where is situated the code which we will debug?

Answer: The code is situated in the data field of the transaction, and this is the code which will be executed at the deployment of a smart contract. His role is to deploy the smart contract on the blockchain and to execute the constructor.

Now, we can continue:

By default, the debugger shows the constructor code at byte 17, to understand the code before byte 17, let’s click in the left arrow in the image above.

So we land on byte 0:

000 PUSH 80 | 0x80 |
002 PUSH 40 | 0x40 | 0x80 |
004 MSTORE ||

We already know these 3 first instructions.

it stores 0x80 at address 0x40 in the EVM memory, it’s equivalent to inline assembly:

mstore(0x40,0x80)

This is the free memory pointer and don’t worry we will talk later about this part :)

Next opcodes are also already known (seen in the part 1 of this series):

005 CALLVALUE |msg.value|
006 DUP1 |msg.value|msg.value|
007 ISZERO |0x01|msg.value|
008 PUSH1 0f |0x0f|0x01|msg.value|
010 JUMPI |msg.value|
011 PUSH1 00 |0x00|msg.value| (if jumpi don't jump to 0f)
013 DUP1 |0x00|0x00|msg.value|
014 REVERT
015 JUMPDEST

Basically, it gets the value of msg.value (ether sent to the contract) via CALLVALUE opcode and revert if the return value is strictly superior to ZERO.

We explained more in the detail what is happening in the first of this series.

This is equivalent to:

if (msg.value > 0) { revert() }

Since our constructor is not payable, we cannot send funds which are normal! Things are the same with the first part at least until The 14th byte.

2. Diving deeper

Now happen next?

Where is the function signature? Where is our function hub? Of course it’s missing, at the deploying time there isn’t any function available apart the constructor!

15 JUMPDEST |0x00|
16 POP ||
17 PUSH1 09 |0x09|
19 PUSH1 00 |0x00|0x09|
21 SSTORE ||

At the 16th byte, the EVM pop the remaining value in the stack. (0)

After that we push 9 and 0 to the stack and call SSTORE, Stack is now:
|0x00|0x09|

SSTORE opcode store in storage (as the name suggests) Stack(1) in to the Stack(0) slot. (so it uses 2 arguments, and therefore they are removed from the stack after the execution of SSTORE at byte 21)

In this case, the EVM store the value 9 in the first slot (slot number 0), this is equivalent to inline assembly:

sstore(0x00,0x09)

This is exactly the code in our constructor:

balance = 9

It stores 9 in the variable “balance”, but balance is situated in storage.

As this is the first declared variable in our smart contract, that means that the 1st storage slot is assigned to “balance” is the slot 0. You can that in the “Storage” section.

Code executed in EVM when the contract is deployed is very short, we already reach the end which is the byte 32 ! (by using the REUTRN opcode similar to STOP)

22 PUSH1 3f |0x3f|
24 DUP1 |0x3f|0x3f|
25 PUSH1 22 |0x22|0x3f|0x3f|
27 PUSH1 00 |0x00|0x22|0x3f|0x3f|
29 CODECOPY |0x3f|
30 PUSH1 00 |0x00|0x3f|
32 RETURN ||

The stack is now empty at byte 21 (as SSTORE don’t keep the 2 arguments in the stack)

3f is pushed and duplicated the stack is then| 0x3f | after that we duplicate 3f push 22 and 00, The stack is now: | 0x00 | 0x22 | 0x3f | 0x3f | at byte 27

if we see the documentation, we see that CODECOPY is a special opcode which copy the current smart contract code in the EVM memory.

It takes 3 arguments:

  • The first is Stack(0), The Instruction copy the current smart contract code in the EVM memory Stack(0) Slot (here Stack(0) = 0), so it will copy in memory at slot 0x00.
  • More precisely it copies the smart contract code from byte at Stack(2) to byte at Stack(2) + Stack(1) to the memory.
    Looking at the stack, this is the code situated between 0x22 (= 34 in dec) and (22+3f = 61 which is is 97 in dec).

Indeed, after the execution of this instruction, if we go to the EVM memory state in the debugger, we see that the memory is filled from 0x00 to 0x3f.

This is the code of our smart contract stored in EVM memory. Every chunk of bytes after the byte 22 (34 in dec) of the transaction was thus the smart contract code!

at byte 32 RETURN is called with Stack(0) = 0x00 and Stack(1) = 0x3f as arguments.

RETURN stop the execution of code and return memory[Stack(0):Stack(0)+Stack(1)] which is [0x00:0x40]

This value returned is the stored in the blockchain. In our case this is the smart contract code !

To summarize this first part, this is the transaction data which is executed in order to deploy a smart contract in the blockchain:

The code which deploy the smart contract (byte 0 to 33)
6080604052348015600f57600080fd5b506009600055603f8060226000396000f3fe-----------
The deployed smart contract (byte 34 to 97)
6080604052600080fdfea264697066735822122018fba077a8095159cac22a23ec0b3172b5ab77a14a3cf44bc3107e4049b7dcf264736f6c63430008070033
--------------

Now you know what happen exactly in the EVM when you deploy a smart contract in the blockchain, this is awesome!

3. Let’s try with payable constructors

What if wrote the constructor as payable? Is there any differences? Let’s see!

Here is our new smart contract, the difference with the previous one is small, we just added the “payable” modifier in the constructor. (don’t change the settings, solidity: 0.8.7, optimizer: 1)

pragma solidity ^0.8.0;  
contract Test {
uint balance;
constructor() payable {
balance = 9;
}
}

Don’t forget to send 1 ether to the smart contract at deployment time, by choosing Ether in the value field.

Here is the full disassembly of the transaction (only 20 bytes).

00 PUSH1 80 
02 PUSH1 40
04 MSTORE
05 PUSH1 09
07 PUSH1 00
09 SSTORE
10 PUSH1 3f
12 DUP1
13 PUSH1 16
15 PUSH1 00
17 CODECOPY
18 PUSH1 00
20 RETURN

I think you already recognize this piece of code. No need to show the stack.

Our free memory pointer is still set, but after that there isn’t any verification of msg.value, the EVM go directly to the constructor code and later copy/return the smart contract code which will be deployed on the blockchain.

The sole difference is that in line 13 were is written PUSH 16 instead of push 22.

It’s because the transaction is 13 bytes smaller. (20–13 in hex = 32–19 = 13 in decimal) Then the EVM start copying code not from bytes 35, but from byte 22 as the smart contract code to deploy is situated just after the execution of the constructor.

Note also that assembly code is a lot shorter, but the gas cost is about the same: 89228 without payable and 89036 (188 gas less).

4. Adding arguments to constructor

As there isn’t a lot of difference between payable and “non payable” constructor, let’s move forward! Why not adding news arguments to the constructor?

Let’s deploy this smart contract, with arguments a = 1, b = 2, msg.value = 1 ether and the same setting as before (Optimizer set to 1 and solidity 0.8.7)

pragma solidity ^0.8.0;  
contract Test {
uint balance;
constructor(uint a,uint b) payable {
balance = 9;
}
}

After watching the debugging tab, byte 0 to 4 is obviously the same as excepted.

Tip : Every solidity smart contracts starts by mstore(0x40,0x80), which is 0x6080604052 in hex.

Now the outcome is slightly different:

005 PUSH1 40 |0x40|
007 MLOAD |0x80|
008 PUSH1 98 |0x98|0x80|
010 CODESIZE |0xd8|0x98|0x80|
011 SUB |0x40|0x80|

MLOAD load from memory the value in Stack(0) address to the stack, in inline assembly it’s MLOAD(0x40). Therefore 80 is pushed to the stack. (because 80 was stored just before at 0x40)

after that the EVM push 98, the stack is now | 0x98 | 0x80 |

CODESIZE don’t take any argument in the stack and returns the size of the code in the stack, if we inspect the stack we should see 0xd8 the new value in the stack. (if you compiled exactly the same code, with the same setting as me and thus the code’s length will be equal)

This is the size of the executed code (hence this is ALSO the size of transaction data because the code to executed is situated in transaction data as said before) The stack is now :
| 0xd8 | 0x98 | 0x80 |

The SUB opcode is called which do Stack(0) — Stack(1), now the stack is
| 0x40 | 0x80 |. Indeed d8–98 = 40 in hex.

012 DUP1     |0x40|0x40|0x80|
013 PUSH1 98 |0x98|0x40|0x40|0x80|
015 DUP4 |0x80|0x98|0x40|0x40|0x80|
016 CODECOPY |0x40|0x80|

After a series of PUSH and DUPs, CODECOPY at byte 16 copy the code executed in the smart contract,
Here are the arguments of the CODECOPY instructions:
- Stack(0) The slot in memory where to copy the code.
- Stack(1) The offset in the executed to code to start copying the code.
- Stack(2) How much bytes to copy ?

Then all code between 0x98 and 0x98 + 0x40 is copied in to the slot 0x80 in memory.

Did you see difference in memory?

We see that the 32 bytes slot memory[0x80:0x99] now contains the first parameter. (1)

Same for memory[0xa0:0xbf] which contains the second parameter. (2)

The goal of this piece of code (from 5th byte to the 16th) was thus to copy the arguments of the constructor in to the memory!

When you deploy a smart contract, only 2 fields are mandatory in the transaction (apart from the signature),it’s the field from and data. “from” contain your address, and data contain the smart contract code (and code to deploy the smart contract, which we are analyzing here) AND the PARAMETERS.

here is an example :

{
from: "0x1234....."
data: "[code to execute when deploying smart contract] [smart contract code to deploy] [constructor parameters]
}

The parameters are situated DIRECTLY AFTER all the data containing the smart contract code (and are coded in 32 bytes which is 0x20 in hex)

Note that the free memory pointer in 0x40 shouldn’t be 0x80 because the memory at 0x80 is not free anymore.

Now as 0x80 is busy, it contains the 2 parameters. 0x40 should be pointing to 0xc0 when the memory is free.

This is the purpose of the code between 17 and 23 (it adds 40 to the the previous

017 DUP2     |0x80|0x40|0x80|      
018 ADD
|0xc0|0x80| add 0x40 to 0x80 (previous free memory pointer loaded at byte7)
019 PUSH1 40
|0x40|0xc0|0x80| push 40 in the stack
021 DUP2
|0xc0|0x40|0xc0|0x80|
022 SWAP1
|0x40|0xc0|0xc0|0x80|
023 MSTORE
|0xc0|0x80| store the result of the addition in 0x40 in memory

The purpose of the free memory pointer is just to point to a free slot in memory, every time memory use this slot the pointer is changed to another address which is free. Let’s continue further:

024 PUSH1 1e |0x1e|0xc0|0x80|
026 SWAP2 |0x80|0xc0|0x1e|
027 PUSH1 29 |0x29|0x80|0xc0|0x1e|
029 JUMP |0x80|0xc0|0x1e| jump to 0x29 (41 in hex)

The code jump unconditionally to 0x29 (41 in dec)

Loading to the stack

Let’s focus ourselves on the Third part of the code, since the arguments 1 and 2 are loaded in the memory.

041 JUMPDEST |0x80|0xc0|0x1e|
042 PUSH1 00 |0x00|0x80|0xc0|0x1e|
044 DUP1 |0x00|0x00|0x80|0xc0|0x1e|
045 PUSH1 40 |0x40|0x00|0x00|0x80|0xc0|0x1e|
047 DUP4 |0x80|0x40|0x00|0x00|0x80|0xc0|0x1e|
048 DUP6 |0xc0|0x80|0x40|0x00|0x00|0x80|0xc0|0x1e|
049 SUB |0x40|0x40|0x00|0x00|0x80|0xc0|0x1e|
050 SLT |0x00|0x00|0x00|0x80|0xc0|0x1e|
051 ISZERO |0x01|0x00|0x00|0x80|0xc0|0x1e|
052 PUSH1 3b |0x3b|0x01|0x00|0x00|0x80|0xc0|0x1e|
054 JUMPI |0x00|0x00|0x80|0xc0|0x1e|
055 PUSH1 00
057 DUP1
058 REVERT

After analysis of this assembly with the debugger (between byte 41 and 54), the smart contract calculates c0–80 and verify it’s equal to 40.

If the subtraction is not equal, the EVM revert, otherwise the execution flow continue and the EVM JUMP to 3b (59 in decimal).

80 and c0 are the start offset and the end offset of the 2 arguments stored in memory. (1 and 2)

As the EVM works by groups of 32 bytes (20 in hex). The goal was just to verify that there are indeed 2 arguments in the constructor loaded in to memory (which is total length 40 in hex). By subtracting the end of offsets of arguments to the start offsets of arguments in memory

If it’s not the case, then this means that there is less than 2 arguments in the transaction. So the EVM reverts between 55 and 58 by NOT jumping to 59

Next at byte (our last piece of code) :

059 JUMPDEST |0x00|0x00|0x80|0xc0|0x1e|
060 POP |0x00|0x80|0xc0|0x1e|
061 POP |0x80|0xc0|0x1e|
062 DUP1 |0x80|0x80|0xc0|0x1e|
063 MLOAD |0x01|0x80|0xc0|0x1e| the first argument was loaded.
064 PUSH1 20 |0x20|0x01|0x80|0xc0|0x1e|
066 SWAP1 |0x01|0x20|0x80|0xc0|0x1e|
067 SWAP2 |0x80|0x20|0x01|0xc0|0x1e|
068 ADD |0xa0|0x01|0xc0|0x1e| add 20 to the offset in memory to load.
069 MLOAD |0x02|0x01|0xc0|0x1e| Our 2 arguments were loaded.
070 SWAP1 |0x01|0x02|0xc0|0x1e|
071 SWAP3 |0x1e|0x02|0xc0|0x01|
072 SWAP1 |0x02|0x1e|0xc0|0x01|
073 SWAP2 |0xc0|0x1e|0x02|0x01|
074 POP |0x1e|0x02|0x01|
075 JUMP |0x02|0x01| Swap stack to jump top 0x1e (30 in dec)

After popping the 2 0x0 in the stack we are left with | 0x80 | 0xc0 | 0x1e | in the stack at byte 61

The EVM duplicates 80 and use MLOAD to load at Stack(0) which load memory at 80, this is the first argument in the constructor we copied to memory before. (1)

Now as every value we load, it’s in the stack.

Next, at byte 64 we need to load the second argument, since the EVM works by groups of 32 bytes (20) in hex, the EVM must load memory at 80 + 20 = a0 to get the second arguments.

This is why the EVM PUSH 20 in the stack and swap some values in the stack in order to the 80 become first.

The EVM adds these 2 numbers at byte 68 which equal a0 and load them in to the stack!

To summarize, this code load the 2 arguments which were stored in memory.

mload(0x80) 
mload(0xa0)

After that the EVM jumps to the byte 30, with 1 and 2 in the stack and the EVM continue the execution, by setting the balance to 9 (executing the constructor between byte 30 and 40) and copying the contract code in to the blockchain. (between byte 76 and 89)

Note that the 2 arguments were POPed away by the constructor between byte 31 and 32 as we don’t used them in the code :)

030 JUMPDEST |0x02|0x01|
031 POP |0x01|
032 POP ||
033 PUSH1 09 |0x09|
035 PUSH1 00 |0x00|0x09|
037 SSTORE ||
038 PUSH1 4c (76 in dec)
040 JUMP
076 JUMPDEST
077 PUSH1 3f
079 DUP1
080 PUSH1 59
082 PUSH1 00
084 CODECOPY
085 PUSH1 00
087 RETURN

And…. We’re done! The smart contract ended it’s execution.

Conclusion/Summary of the last contract

To summarize the reverse of the last contract.

  1. It stores the free memory pointer as every smart contract does.
  2. It copies the 2 arguments provided by transaction data and store them to memory.
  3. It verifies that we entered at least 2 arguments in the constructor. (not less)
  4. It copies the 2 arguments in memory in to the stack.
  5. It executes the constructor by setting balance to 9.
  6. It copies the code into memory and stop the execution.

This is a lot of work for today, but don’t worry this series is far from ended.

In the part 3 we will talk about the storage and see how we can optimize gas by using the best variable type suited for our use.

--

--

Alain | Web3hackingLabs

Smart contract Auditor & Cybersecurity engineer, follow me on Twitter to get more value: https://rebrand.ly/twitter_medium