Reversing and Debugging EVM Smart contracts : 5 Instructions to end/abort the Execution (Part 4)
In the EVM There is in total 5 ways to end the execution of a smart contract. We will study them in detail in this article. Let’s Start NOW!
🔴 This is the 4th part of our series about reversing and debugging EVM smart contracts, here you can find previous & next parts:
- ✅ Reversing and debugging EVM Smart contracts: First steps in assembly (part 1️⃣)
- ✅ Reversing and debugging EVM Smart contracts: Deployment of a smart contract (Part 2️⃣)
- ✅ Reversing and debugging EVM Smart contracts: How the storage layout works? (part 3️⃣)
- Reversing and Debugging EVM Smart contracts: 5 Instructions to end/abort the Execution (part 4️⃣)
- Reversing and debugging EVM Smart contracts: The Execution flow if/else/for/functions (part 5️⃣)
- Reversing and debugging EVM Smart contracts: Full Smart Contract layout (part 6️⃣)
- Reversing and debugging EVM Smart contracts: External Calls and contract deployment (part 7️⃣)
1. STOP
We will start, by using the simplest opcode present in the EVM.
This is the sole opcode which consumes 0 gas, and as the name suggests, it ends the execution of a smart contract, without returning any data.
You can disassemble this very simple smart contract to figure out what’s happening. (execution of the function starts at byte 45)
045 JUMPDEST |function signature discarded|
046 PUSH1 33 |0x33|
048 PUSH1 35 |0x35|0x33|
050 JUMP |0x33|053 JUMPDEST |0x33|
054 JUMP ||051 JUMPDEST ||
052 STOP ||
At the end after 2 jumps. The memory contains nothing. No data were stored, the stack contains only the function signature, thus no data is returned.
This as simple as this.
2. RETURN opcode
RETURN ends the execution of the smart contract as STOP but contrary to STOP it may also return some data. We will compile this solidity code:
And disassemble the function:
045 JUMPDEST ||
046 PUSH1 08 |0x08| the return value of test()
048 PUSH1 40 |0x40|0x08|
050 MLOAD |0x80|0x08| mload(0x40) mloads the free memory pointer
051 SWAP1 |0x08|0x80|
052 DUP2 |0x80|0x08|0x80|
053 MSTORE |0x80| mstore(0x80,0x08) store the return value in memory[0x80]
054 PUSH1 20 |0x20|0x80|
056 ADD |0xa0|
057 PUSH1 40 |0x40|0xa0|
059 MLOAD |0x80|0xa0|
060 DUP1 |0x80|0x80|0xa0|
061 SWAP2 |0xa0|0x80|0x80|
062 SUB |0x20|0x80|
063 SWAP1 |0x80|0x20|
064 RETURN ||
Between 45 and 50 the EVM mload(0x40), which returns 80.
Between 51 and 53 the EVM mstore(0x80,0x08), 80 was the free memory address and 8 is the return value of the test function.
Between 54 and 56 the EVM adds 20 to the previous result (80), which equals a0 (20 = 32 in decimal because this is the size of a single memory slot as there is only one return value)
Between 57 and 62 it reloads memory at 40 (mload(0x40)) and sub the result with 0xa0 (the result in line 56) which is 0x20.
Nothing very interesting here. At instruction 64, there is 0x08 in the slot 0x80 in memory and there are 80 and 20 in the stack. What means these 3 values?
According to the documentation. When REVERT is called.
Stack(0) = 80 should contains the offset in memory of the return data
Stack(1) = 20 should contains the size after the offset of the return data.
This is exactly the case in this smart contract, memory between 0x80 and 0xa0 (= 80 + 20 in hex) contains the return value (8) of the function test.
So the smart contract returns memory[Stack(0):Stack(0)+Stack(1)]
3. REVERT opcode
Now, let’s modify the smart contract.
Did you spot the difference? Instead of using return(), I used revert() with a string in argument (I can’t use numbers in “revert”, the solidity compilator don’t let me to compile).
If you call test(), you should see an error, but debugging is still possible!
Here is the disassembly of the function test:
069 JUMPDEST ||
070 PUSH1 40 |0x40|
072 MLOAD |0x80|
073 PUSH3 461bcd |0x461bcd|0x80|
077 PUSH1 e5 |0xe5|0x461bcd|0x80|
079 SHL |0x08c379a000...000|0x80| binary shift 197 times (e5 in hex), YES a binary shift can modify hex numbers...
080 DUP2 |0x80|0x08c379a000...000|0x80|
081 MSTORE |0x80|
082 PUSH1 20 |0x20|0x80|
084 PUSH1 04 |0x04|0x20|0x80|
086 DUP3 |0x80|0x04|0x20|0x80|
087 ADD |0x84|0x20|0x80|
088 MSTORE |0x80|
089 PUSH1 05 |0x05|0x80|
091 PUSH1 24 |0x24|0x05|0x80|
093 DUP3 |0x80|0x24|0x05|0x80|
094 ADD |0xa4|0x05|0x80|
095 MSTORE |0x80|
096 PUSH5 195a59da1d |0x195a59da1d|0x80|
102 PUSH1 da |0xda|0x195a59da1d|0x80|
104 SHL |0x00..195a59da1d..00|0x80|
105 PUSH1 44 |0x44|0x00..195a59da1d..00|0x80|
107 DUP3 |0x80|0x44|0x00..195a59da1d..00|0x80|
108 ADD |0xc4|0x00..195a59da1d..00|0x80|
109 MSTORE |0x80|
110 PUSH1 00 |0x00|0x80|
112 SWAP1 |0x80|0x00|
113 PUSH1 64 |0x64|0x80|0x00|
115 ADD |0xe4|0x00|
116 PUSH1 40 |0x40|0xe4|0x00|
118 MLOAD |0x80|0xe4|0x00|
119 DUP1 |0x80|0x80|0xe4|0x00|
120 SWAP2 |0xe4|0x80|0x80|0x00|
121 SUB |0x64|0x80|0x00|
122 SWAP1 |0x80|0x64|0x00|
123 REVERT |0x00|
The idea is about the same as RETURN, the EVM store the return value in the memory and the 2 offset in the stack. The code is longer but not as complex as it seems.
Between byte 69 and 72, the free memory pointer is retrieved, (mload(0x40), which returned 0x80, so we can mstore at 0x80 next time.)
After that between 73 and 81 the EVM mstore(0x80, 0x08c379a000000000000000000000000000000000000000000000000000000000) in memory.
Don’t forget that 0x08c379a000… was obtained by binary shift of 0x461bcd e5 times
Memory[0x80:0x84] is thus equal to 0x08c379a
Between 82 and 88, it does the same: The EVM adds 4 to 80 = 84 and mstore(0x84 //the result of 80+4,0x20) in memory, it added 0x04 because it’s the size of the last data stored in memory at 0x80, this data is thus stored next after 0x08c379a0….
Memory[0x84:0xa4] is now equal to 0x20:
Between bytes 89 and 95, the EVM stores 0x05 in memory by using the same way as before mstore(0xa4,0x05)
Memory[0xa4:0xc4] = thus 0x05.
between 96 and 104 0x6569676874 is pushed to the stack (and shifter to the left) so 0x6569676874000….0000 is in the stack.
If we convert 6569676874 from hex to ascii (text) we can find the “eight” string which is the return value:
Between 105 and 109:
- The EVM add 44 to 80 (the free memory pointer) = c4 (before all the slots in memory are occupied)
- IT mstore(0xc4,0x6569676874000.000) in memory.
The result is thus :
Finally at instruction 123, the EVM revert with 80 as a starting offset and 64 as size. (This is exactly the same as with the RETURN opcode.)
This means that the return data is situated between 0x80 and 0xe4.
The difference it that the EVM return a lot more info about the revert, not only the text “eight” as we might guess but 3 more arguments.
So What are the 3 unknown values ?
- 0x08c379a0 is the Error(string) function signature. Every time someone uses revert with an argument in his smart contract, as error function is returned when… There is an error.
- 20
- 5
- 0x5569676874 is the eight string
Basically, it just means that the revert return the function Error(20,5,”eight”) to the blockchain.
4. INVALID opcode
Before diving in this opcode, let’s answer a question:
What is the size a smart contract ?
It can range between 1 byte and 24.576 Kb,
This smart contract is only composed of opcode (like PUSH, POP, DUP, SSTORE we already know) these opcodes are translated directly into binary.
Every instruction without argument take 1 byte of memory. For example :
- REVERT is 0xFD
- SELFDESTRUCT is 0xFF
Some instructions with arguments can take 2 or more bytes
- PUSH1 0x80 is 6080 (PUSH1 alone is the 2 first bytes: 0x60 and 0x80 is the instruction’s argument)
- DUP1 0x80 is 8080
- SWAP4 0xFFFFFFFF is 93FFFFFFFF (SWAP 4 alone is 0x93)
And the contract byte-code is ONLY, the concatenation of all the instruction byte-code.
But an issue arises: There is 16*16=256 combination of different opcodes (00 to FF) but only part of them are assigned. (About 145 witches are no assigned.)
These not assigned OPCODES are called : INVALID opcode.
Normally if you compile your smart contract with solidity to the EVM byte code, unless there is a bug in the compilation, there shouldn’t be accessible INVALID opcodes.
But if the EVM (by any means) falls in an invalid opcode, it automatically reverts!
But actually, there is a possibility that some INVALID opcodes are present in the smart contract especially at the end, but this code is unreachable, this means that with whatever transaction sent to the smart contract, the EVM won’t read the code at the end, there will always be a jump before.
After the JUMP at byte 54, no code can be executed in the image.
But why is there some code after the byte 54? What is the purpose of this code then ? can we remove all the byte-code after the byte the 54?
Firstly, this is the hash of the metadata of the compiled smart contract, but which metadata ?
When Solidity compiles the smart contract, it automatically generates a JSON file containing all data about the smart contract.
If you go to compile tab in remix, click on compilation Details and “METADATA” (usually the 2 in the list) you should see all the metadata which contain:
- The compiler version (0.8.7 in our case)
- The “output” containing abi
- The compilation settings (version, optimizer…)
- Path of the smart contract
This mean that, 2 exactly same smart contract compiled with the same version can have different byte-code ! (the difference will be only in the end)
But why the solidity compiled do this ?
According to the solidity documentation, It’s used for accessing the contract in ipfs of in Swarm, you can learn more here : https://docs.soliditylang.org/en/v0.8.13/metadata.html
Second question: can you remove this chunk of data to save gas ?
YES, you can configure that in remix ! You just have to craft a transaction and remove manually these 52bytes at the end of the smart contract.
At the contract deployment, every byte cost 200gas, as the IPFS hash of the metadata is 52 byte of length you can economize 10400 gas by disabling the option witch is not that small (a simple transfer cost 21000 gas by comparison)
5. SELFDESTRUCT opcode
Did you knew that you can remove a smart contract from the blockchain by calling one opcode ?
Here is the smart contract, we will compile and test:
After the disassembly of the test() function, we get:
53 JUMPDEST
54 PUSH1 0x00
56 PUSH20 0xffffffffffffffffffffffffffffffffffffffff
78 AND
79 SELFDESTRUCT
(after byte 79, there is the hash of the metadata as told in previous section)
(No need to show the stack in this example)
0x0 is ANDed with 0xffffffffffffffffffffffffffffffffffffffff which resulted in a 0x0 in the stack between instruction 53 and 78.
Stack(0) contains now 0x00 after byte 78. At byte 79 SELFDESTRUCT instruction is called with Stack(0) (the 0 address) in argument.
But what is SELDESTRUCT and why does SELFDESTRUCT take an argument?
SELDESTRUCT remove the smart contract from the blockchain.
At the destruction If the smart contract contains some ETH, these funds cannot disappear. As a result all funds stored in the smart contract will be sent to the new address. That’s why.
But a question arises: what if the new address is a smart contract WITHOUT receive and fallback function (or what if receive function reverts?) where does the funds goes?
The answer is simple, in this case Ethereum will make an exception: the smart contract will still get the funds even if the function reverts!
This means that it’s possible to send ETH to a smart contract and force it to accept the funds.
If a smart contract logic relies to much on the ETH balance, then this can lead to an undefined behavior. This is known as the self-destruct security flaw.
Last question, why it’s interesting to use this opcode?
If you’re done with a smart contract and you don’t need it anymore. It’s cheaper to call selfdestruct(address) than to and let the contract alive and transfer the funds manually. (using transfer, send or call for example)
This is the case because selfdestruct(address) free up space in the blockchain, so the gas cost are cheaper than a simple transfer.
6. Conclusion
This section was pretty easy, I wanted to show you ALL the possible ways a smart contract execution can be halted, here is what you learned:
- 5 instructions to halt the contract.
- Some security about self-destruct.
- What is the metadata hash of the contract
- Return values of REVERT and RETURN
See you next time !
🔴 This was the 4st part of our series about reversing and debugging EVM smart contracts, here you can find previous & next parts:
- ✅ Reversing and debugging EVM Smart contracts: First steps in assembly (part 1️⃣)
- ✅ Reversing and debugging EVM Smart contracts: Deployment of a smart contract (Part 2️⃣)
- ✅ Reversing and debugging EVM Smart contracts: How the storage layout works? (part 3️⃣)
- ✅ Reversing and Debugging EVM Smart contracts: 5 Instructions to end/abort the Execution (part 4️⃣)
- Reversing and debugging EVM Smart contracts: The Execution flow if/else/for/functions (part 5️⃣)
- Reversing and debugging EVM Smart contracts: Full Smart Contract layout (part 6️⃣)
- Reversing and debugging EVM Smart contracts: External Calls and contract deployment (part 7️⃣)