Reversing and debugging EVM Smart contracts: First steps in assembly (part 1)

Alain | Web3hackingLabs
15 min readJun 20, 2022

--

▶ 0. Introduction

In this series of tutorials, we will learn how to debug and reverse EVM (Ethereum Virtual Machine) smart contracts.

You may already know that, when a smart contract is not verified in a blockchain, you CANNOT read it’s solidity code, only the byte-code is displayed.

illegible smart contract

The issue is that it’s very hard to fully “de-compile” from byte-code to reconstructs the solidity code BEFORE the compilation.

But don’t worry, in this series of tutorials I will teach you clearly all techniques to reverse any smart contract in the blockchain.

There are several advantages to learn this skill compared with someone who doesn’t know.

  • You’ll be able to read opaque smart contracts. (even if the source code isn’t verified)
  • You’ll have a deep understanding of the EVM and therefore be a better developer/smart contract auditor. (and so earn more money.)
  • You’ll debug code more efficiently in your smart contracts and avoid to loose a high amount of time when there is an error. (Especially if the high-level error is to generic like: “Execution reverted”)

▶ 1. Introduction

Here is the smart contract we will reverse/debug together:

// SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.0;
contract Test {

function test() external { }

function test2() external { }

function test3() external { }
}

Seems pretty easy, right?

Yes, we will start simple for now.

1. Compile it in Remix IDE (Version 0.8.7) https://remix.ethereum.org

Compiling our Smart contract

2. Deploy it (in the JavaScript London VM)

3. Call the test() function and click in the blue debug button, to show the debugger.

4. Once it’s done, you should see the debugging tab in Remix.

90% of our work will be located here.

Before diving deeper in the subject, Here is some Prerequisites, if you want to understand this first part you need:

  • Some solidity experience.
  • Hex numbers and basic computer science.
  • Basics of Remix IDE.
  • Motivation and maybe (a lot of) coffee.

▶ 2. What is byte-code/assembly?

Every smart contract is constituted of byte-code, for example this is the byte code (in hex) of the smart contract we created at the beginning of the article:

0x6080604052348015600f57600080fd5b5060043610603c5760003560e01c80630a8e8e0114604157806366e41cb7146049578063f8a8fd6d146051575b600080fd5b60476059565b005b604f605b565b005b6057605d565b005b565b565b56fea2646970667358221220d28f98515dc0855e1c6f5aa3747ff775f1b8ab6545f14c70641ff9af67c2465164736f6c63430008070033

Every byte of this byte code correspond to an instruction in assembly language. You probably already know that the EVM don’t understand the solidity language directly, it only understands instructions in assembly which is a low-level language.

At the compile time, the role of the compilation is just to translate our solidity code to assembly code.

Assembly is a very primitive type of “language”, there is only instructions and parameters. For example:

000 PUSH 40041 PUSH1 00

056 DUP1

The instruction at byte 00 (the first byte) in the smart contract is PUSH 80 (translated to 6080in byte-code opcode).
The instruction at byte 41 is PUSH1 00(and with 1 argument which is 00) (6000 in byte-code opcode).
The instruction at byte 56 is CALLDATALOAD without arguments ( 80in byte-code opcode). I highlighted these instructions in the contract byte-code upwards.

In the next part of the tutorial, we will gradually explain what these instructions do internally.
In the EVM, there are about 100 valid instructions in the EVM, some are pretty easy to guess like:

  • ADD/SUB/OR/XOR
  • But others need a bit more explanations.

🟦 Tip: Every time there is an instruction that you don’t understand you can go to https://www.ethervm.io/, this website summarize all Ethereum instructions, show the arguments and the return values.

▶ 3. Memory in Solidity

You probably already know that there are 3 types of memory in solidity.

  1. The storage, which is directly stored in the blockchain, identified by a 32 byte number named “slot”. The size of a slot is 32 bytes (or 64 hex numbers)
  2. The “memory”, which is wiped out when the execution of the smart contract ends, identified by an address which is a hex number.
  3. And the stack which is a LIFO type storage, when each slot is identified by a number. (starting by 0)

▶ 4. How the LIFO Stack Works ?

By default, at the start of the smart contract, the stack is empty and it contains noting!
Now there is 2 ways to write to the stack, either by using the instruction PUSH or POP:

4.1 PUSH

it writes data in the 0th place and push every data 1 slot further. Example, if we write 0xff in the stack by using PUSH instruction :

Stack before (3 elems): |Place 0: 0x50|Place 1: 0x17|Place 2: 0x05|
----------------------------
Stack after PUSH ff: |Place 0: 0xff|Place 1: 0x50|Place 2: 0x17|Place 3: 0x05|

0xff was written at Place 0, 0x50 went from place 0 to 1, 0x17 from 1 to 2 and 0x05 from 2 to 3. Now the Stack contains 4 elements instead of 3.

Let’s see another example:

Stack before (0 elems, empty): ||
----------------------------
Stack after PUSH 33: |Place 0: 0x33|

The stack contains now 1 element. And here is a last example:

Stack before (0 elems, empty): |Place 0: 0x33|
----------------------------
Stack after PUSH 00: |Place 0: 0x00|Place 1: 0x33|

The Stack now contains 2 elements, it’s as simple as this.

4.2 POP

POP instruction, do the inverse: delete data in the 0th slot and push every data by 1 slot back.

Stack before (3 elems): |Place 0: 0x50|Place 1: 0x17|Place 2: 0x05|
----------------------------
Stack after POP (2 elems): |Place 0: 0x17|Place 1: 0x05|

The data in the 0th place was deleted, 0x17 in place went from place 1 to 0 and same for 0x05 which went from place 2 to place 1. The stack contains 2 elements instead of one.

Here is another example:

Stack before (1 elems): |Place 0: 0x33|
----------------------------
Stack before POP (0 elems, empty): ||

It’s also as simple as that, if you understood this. Your understood LIFO type storage, and you can go further :)

How the stack (LIFO) works

In EVM (and in others assembly) The stack is usually used for storing arguments + return values of functions and of instructions.

In these series of articles, we will note:
Stack(0) = the first value in the stack (at address 0).
Stack(1) = the second value on the stack (at address 1).
Stack(n) = the n+1th value on the stack (at address n).
Every time ill explain an instruction, The content of the stack in this format |0x15|0x25|0x00| , Here:

  • 0x15 is Stack(0), the first value in the Stack, at address 0
  • 0x25 is Stack(1), the second value of the Stack
  • 0x00 is Stack(2)
  • and so on, if there is more values in the stack

▶ 5. First Lines of the Assembly

Once you understood these concepts, it’s time to start! Click in the button below to restart the execution of the smart contract from the TRUE beginning : (by default remix starts the debugging session at the start of the function test(), as there is some code before execution of the function we need to change this)

If all goes right the first instructions should pop up, you can navigate between instruction one by one by clicking on these arrows:

The 3 first instruction are

000 PUSH 80 | 0x80 |
002 PUSH 40 | 0x40 | 0x80 |
004 MSTORE ||

The EVM PUSH 80 and PUSH 40 in the stack, as a result it looks like:
| 0x40 | 0x80 |

At byte 4: MSTORE takes 2 arguments : Stack(0) and Stack(1)
MSTORE
store in memory the value of Stack(0) in the Stack(1) slot

Hence the EVM store 0x80 in the 0x40 address in memory, in the memory part of the debugging tab, you should see:

As every slot in memory is 32 bytes of length (0x20 in hexadecimal by using little endian) therefore the memory at slot 40 is situated between 0x40 and 0x40+0x20 = 0x60 (we wan note this as memory[0x40:0x60]

This is why 0x80 is at the end (0x5f)

The “?????” are just the ASCII representations of the bytes in memory.

the “0x40" slot in memory is named as the free memory pointer in EVM, it’s used for allocating a new slot in memory when needed. (i will explain later why it’s useful)

IMPORTANT: Note that after an instruction, all the needed arguments in the stack are erased from the stack and replaced by the return value.

As MSTORE took 2 arguments in the Stack, after the completion of MSTORE instruction, these 2 arguments are erased from the stack.

So the stack now contains nothing.

▶ 6. MSG.VALUE

005 CALLVALUE |msg.value|
006 DUP1 |msg.value|msg.value|
007 ISZERO |0x01|msg.value|
008 PUSH1 0f |0x0f|0x01|msg.value|
010 JUMPI |msg.value|
011 PUSH1 00 |0x00|msg.value| (if jumpi don't jump to 0f)
013 DUP1 |0x00|0x00|msg.value|
014 REVERT

The CALLVALUE instruction return msg.value (the ether sent to the smart contract) in the stack.
As we don’t sent any Ether to the smart contract, the values in the stack are:
| 0x00 |

DUP1 instruction PUSH Stack(0) in the Stack,we can say that it “duplicate” the first instruction in the beginning of the stack.
| 0x00 | 0x00 |

Note that there also exists DUP2, DUP3… DUPn (until DUP16), which push the n-th value (Stack n-1) on the Stack.

And the EVM call ISZERO at byte 7, ISZERO the 1 arguments in the Stack (it’s Stack(0))

As the name suggests, ISZERO verify if Stack(0) is equal to zero, if yes the EVM PUSH the value “1” in the first slot which True.
| 0x01 | 0x00 |

The EVM also removed the first 0x00 because it was the argument of ISZERO

After that at byte 8, the EVM PUSH 0x0f to the stack | 0x0f | 0x01 | 0x00 |

Next we have a condition (JUMPI), the EVM go directly to byte number Stack(0) if stack(1) is 1 (As Stack(0) = 0f, 15 in dec, if Stack(1) = 1, EVM jumps directly to the 15th byte

if not, EVM continue it’s path like noting happened to the PUSH, DUP1 and finally REVERT instruction at byte 14 which halt execution with an error.

But here all is OK ! As Stack(1) = 1, so the EVM jump to 0x0f (which equals 15 in dec)

We will try to understand what happened between byte 5 and byte 14.

Note that we declared function test()as not payable, moreover the contract don’t posses receive() or fallback() function which can handle the receiving ether.

As a result, this contract can’t get any ether (apart in one specific case but it doesn’t matter here), so if we send Ether, It reverts! The code in assembly is equivalent to :

005 CALLVALUE load msg.value
006 DUP1 duplicate msg.value
007 ISZERO verify if msg.value is equal to 0
008 PUSH1 0f push 0f in the Stack (the byte location after the REVERT byte location)
010 JUMPI jump to this location if msg.value is not equal to 0
011 PUSH1 00 push 00 in the Stack
013 DUP1 duplicate 00 in the Stack
014 REVERT revert the execution
In solidity, this is equivalent to:if (msg.value > 0) {
revert();
} else {
// Jump to byte 15
}

So this second part of code just verify that there isn’t any ether sent to the contract, otherwise it revert.

The stack is now at byte 15: | 0x00 | (as JUMP use 2 arguments in the stack, the EVM removed them)

▶ 7. CALLDATASIZE

015 JUMPDEST     | 0x00 |
016 POP ||
017 PUSH1 04 | 0x04 |
019 CALLDATASIZE | msg.data.size | 0x04 |
020 LT | msg.data.size > 0x04 |
021 PUSH1 3c | 0x3c | msg.data.size > 0x04 |
023 JUMPI || (JUMPI takes 2 arguments)
060 JUMPDEST ||
061 PUSH1 00 |0x00|
063 DUP1 |0x00|0x00|
064 REVERT ||

JUMPDEST does nothing. It just mean that a JUMP or a JUMPI instruction is pointing here, if the EVM jump to an address which is not marked as “JUMPDEST” (like the 16 which is POP) then it automatically revert.

Next the EVM POP the last element of the stack and PUSH 04, therefore there is only one element inside the Stack after byte 17. | 0x04 |

The EVM call CALLDATASIZE, which is equal to msg.data.size (the size of the data field in the Ethereum transaction bellow) the stack is now:
| 0x04 | 0x04 |
(when a function is called without arguments msg.data.size = 4, the 4 bytes is called the function “signature”)

Raw transaction in Ethereum

For example here msg.data is equal to “0x12345678” and msg.data.size = 4 (8 hex numbers)

later at byte 20 the EVM call LT (less than), it compares the 2 values on the stack (if Stack(0) < Stack(1) then we write 1 and 0 otherwise).

in our case It’s false ! 4 is not less than 4 (here the operator LT is strict)

So the EVM DON’T jump to 3c (as Stack(0) = 3c and Stack(1) = 0), the EVM continue the execution flow like nothing happened.

But if CALLDATASIZE was less than 4 (like 0,1,2 or 3) so the Stack(1) = 1 and then the EVM jump to 0x28 (40 in decimal) and the EVM… REVERT !

Here is what is happening:

015 JUMPDEST     
016 POP pop
017 PUSH1 04 store 0x04 in the stack
019 CALLDATASIZE get msg.data.size in the stack
020 LT verify if msg.data.size < 0x04
021 PUSH1 3c push 0x3c (60 in dec)
023 JUMPI jump to 60 if msg.data.size < 0x04
060 JUMPDEST
061 PUSH1 00
063 DUP1
064 REVERT revert the execution

That mean that msg.data can’t be less than 4, and you will understand in the next section why !

if (msg.data.size < 4) { revert(); }

▶ 8. Function Selector

Once all prior verification are done,

We need to call the function test() and execute it’s code. But there are several function in our contract (test() test2() and test3()), how to figure out which function the EVM need must execute?

This is the role of the function selector!

Here is the disassembly of the next steps

024 PUSH1 00 |0x00| (the stack was previously empty in byte 23)
026 CALLDATALOAD |0xf8a8fd6d0000000.60zeros.000000000|
027 PUSH1 e0 |0xe0|0xf8a8fd6d0000000.60zeros.000000000|
029 SHR |0xf8a8fd6d|
030 DUP1 |0xf8a8fd6d|0xf8a8fd6d|
031 PUSH4 0a8e8e01 |0x0a8e8e01|0xf8a8fd6d|0xf8a8fd6d|
036 EQ |0x0|0xf8a8fd6d|0xf8a8fd6d|
037 PUSH1 41 |0x41|0x1|0xf8a8fd6d|
039 JUMPI |0xf8a8fd6d|
040 DUP1 |0xf8a8fd6d|0xf8a8fd6d|
041 PUSH4 66e41cb7 |0x66e41cb7|0xf8a8fd6d|0xf8a8fd6d|
046 EQ |0x0|0xf8a8fd6d|
047 PUSH1 49 |0x49|0x1|0xf8a8fd6d|
049 JUMPI |0xf8a8fd6d|
050 DUP1 |0xf8a8fd6d|0xf8a8fd6d|
051 PUSH4 f8a8fd6d |0xf8a8fd6d|0xf8a8fd6d|0xf8a8fd6d|
056 EQ |0x1|0xf8a8fd6d|
057 PUSH1 51 |0x51|0x1|0xf8a8fd6d|
059 JUMPI |0xf8a8fd6d|

You may already know what is the function signature in Ethereum : it’s the first 4 bytes of the hash of the function name, for test() it’s :

bytes4(keccak256(”test()”)) = 0xf8a8fd6d

CALLDATALOAD takes 1 argument Stack(0) as always and store in the stack the next 32 bytes of msg.data after the first argument (Stack(0) here) in the stack Stack(0)

In this case it stores the first 32 bytes of msg.data (because Stack(0) = 0)

But there is only 4 byte (like said before) thus the stack will be like :
| 0xf8a8fd6d00000000000000000000000000000000000000000000000000000 |

The next opcode are PUSH e0 and SHR at byte 27 (taking 2 arguments), it performs a binary shifts by Stack(0) (c0 here) to the right (>>) the value Stack(1), the stack is (before SHR)

|0xc0|0xf8a8fd6d00000000000000000000000000000000000000000000000000000 |

Here is the detailed calculation with SHR (you can skip if you want):

A place in stack is of length 32 bytes = 256 bitsIn binary Stack(1) = 11111000101010001111110101101101 and 192 zeros after thatc0 = 192 in decimal, so we will shift 192 time to the right0 times   : 11111000101010001111110101101101..... + 192 zeros
1 times : 011111000101010001111110101101101.... + 191 zeros
2 times : 0011111000101010001111110101101101... + 190 zeros
192 times : 192 zeros + 0011111000101010001111110101101101...
= 0x00000000000000000000000000000000000000000000000000000f8a8fd6d
= 0x00..60zeros00f8a8fd6d

Thus result is 0x00000000000000000000000000000000000000000000000000000f8a8fd6d which is stored on the stack (it’s equivalent to 0xf8a8fd6d)

After the DUP opcode the stack looks like | 0xf8a8fd6d | 0xf8a8fd6d |

It’s worth noting that, this is our test() signature, and it’s normal! the signature of the function is always present in the first 4 bytes of the transaction data.

In any Ethereum transaction we don’t send directly the name of the function to execute, but only the 4 bytes signature.

In the 31th opcode the EVM PUSH a 4 bytes value to the stack : 0a8e8e01

| 0xa8e8e01 | 0xf8a8fd6d | 0xf8a8fd6d |

and call EQ which compare (Stack(0) and Stack(1))

These 2 values are obviously not equal : as a result we replace them by 0
| 0x0 | 0xf8a8fd6d |

An so we don’t JUMP to 41, 65 in hex (there is a push 41 and a JUMPI just after that)

The EVM do exactly the same with 0x66e41cb7 (opcodes 41 to 50), which is not equal to 0xf8a8fd6d too.

And finally The EVM do also the with 0xf8a8fd6d which is now equal to 0xf8a8fd6d ! So we jump to 51 (in hex it’s 81) which is the beginning of the test() function.

081 JUMPDEST |0xf8a8fd6d|
082 PUSH1 57 |0x57|0xf8a8fd6d|
084 PUSH1 5d |0x5d|0x57|0xf8a8fd6d|
086 JUMP |0x57|0xf8a8fd6d|
087 JUMPDEST |0xf8a8fd6d|
088 STOP ||
093 JUMPDEST |0x57|0xf8a8fd6d|
094 JUMP |0xf8a8fd6d|

You can easily analyse the 8 last instructions executed in our test() function.

It performs just a series of JUMP and At the end of the function, the opcode STOP is reached which halt the execution of the contract without throwing an error.
All this code behaves like a switch in programming:

0xf8a8fd6d is the signature of the “test()” function
0x0a8e8e01 and 0x66e41cb7 are the signature of the test2 and test3 functions

If the signature in the transaction data match with one of these signatures then execute the function’s code by jumping to function’s code location (41,49 and 51 in the code)

Otherwise: If the signature in the transaction data doesn’t match any function signature in the code, the EVM will call the fallback function, but there is not such function in our smart contract (at least in this part) ! As a result: the EVM revert and the story end here.

This is the code after the 59 (function selector switch)

060 JUMPDEST
061 PUSH1 00
063 DUP1
064 REVERT

We can thus reconstruct the full code of the smart contract:

mstore(0x40,0x80)                              
if (msg.value > 0) { revert(); }
if (msg.data.size < 4) { revert(); } byte4 selector = msg.data[0x00:0x04]
switch (selector) {
case 0x0a8e8e01: // JUMP to 41 (65 in dec) stop()
case 0x66e41cb7: // JUMP to 49 (73 in dec) stop()
case 0xf8a8fd6d: // JUMP to 51 (85 in dec) stop()
default: revert();
stop()

We’re done!

▶ 9. Conclusion

We managed to learn:

  • Some basic EVM assembly.
  • How the EVM execute smart contracts.
  • Which code is executed before the execution of the function.
  • How a LIFO stack works.
  • Basic use of the remix debugger.
  • The functions selectors.
  • And a lot more…

This is enough for the 1st part of this series about reversing and debugging smart contracts. I hope you learned at lot here.

See you in the next part!

--

--

Alain | Web3hackingLabs

Smart contract Auditor & Cybersecurity engineer, follow me on Twitter to get more value: https://rebrand.ly/twitter_medium