handmade.network » Wiki » Tutorial/x64 Assembly

Setting Up

This tutorial assumes you're running on a modern 64-bit UNIX operating system, with access to libc and a linker.

The assembler we will use is called nasm. On ArchLinux, you can download it with the command sudo pacman -S nasm.

Once you have downloaded it, you can check it is installed correctly by running the command nasm --version.

Hello, world!

Let's write a program to test our setup. Don't worry how it works; I'll explain later.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
[bits 64] 
[section .text] 

[global main]
[extern printf]

main:
    xor  rax,rax
    mov  rdi,hello_world_string
    mov  rbx,printf
    call rbx
    xor  rax,rax
    ret

hello_world_string: 
    db "Hello, world!", 10, 0 

And here's our build script:

1
2
3
4
5
6
7
8
# Replace assembly.s with the name of your source file.
nasm -felf64 -Fdwarf -o assembly.o assembly.s

# Replace gcc with the name of the linker you want to use. Make sure to link the program against libc.
gcc -o assembly assembly.o

# Run the executable.
./assembly

If you run that, you should see the following printed out:

Hello, world!

Concepts

As we'll be learning assembly, there are some basic concepts about the functioning of processors that we need to learn.

Instructions

Every program you compile in a high level language ultimately gets converted to a list of instructions. A processor instruction is the simplest command a programmer can instruct the processor to perform. They, and their operands (parameters) are encoded in your program's text segment. This is a read-only part of your executable that is loaded in memory when your program is started. The processor will then start enumerating instructions from the beginning of the program and performing each one, serially (or, at least it'll appear this way) until it encounters an instruction that instructs it (sometimes based on condition) to branch to a different part of the program. In x64, each instruction can only perform a simple task. This could be incrementing a value in memory, comparing the value of two registers (you'll soon learn what they are), branching to another part of the program, etc.

Registers

A register is simply a small, storage location on the processor. Registers are similar to read-writable memory locations except for a few key characteristics.

  • They are accessed far quicker than main memory, because they are actually stored inside the processor.
  • There are far fewer registers than bytes of memory.
  • Some registers have special functions or meaning to the processor. This means some registers cannot be directly modified or viewed by the executing program.

Example

Here's a short program to demonstrate what we've learnt.

1
2
3
mov  rax,10
mov  rbx,20
add  rax,rbx

This assembly program adds together 10 and 20. Each line represents an instruction, with the mnemonic at the start of each line representing which instruction it is, and the following text containing information about the operands (parameters). rax and rbx are general-purpose registers, meaning they can be used to store any values the programmer wants without affecting the execution of the processor.

Let's suppose this program gets loaded into memory at address 0x100000. This is a hex dump of that location in memory, with each instruction enclosed by []. You may notice that bytes 0x100003 and 0x10000A contain the numeric values 10 and 20: where do you think they came from?

|---------------------------------------------------------------------------
|         | -0  -1  -2  -3  -4  -5  -6  -7  -8  -9  -a  -b  -c  -d  -e  -f |
|--------------------------------------------------------------------------|
|0x10000- |[48  c7  c0  0a  00  00  00][48  c7  c3  14  00  00  00][48  01 |
|0x10001- | d8]                                                            |
|---------------------------------------------------------------------------

Since the processor needs to keep track of which instruction it's about to execute - you guessed it - it needs a register to store its address. On x64 this is called the RIP register.

Now, let's take a look at how the value stored in each register changes as our program is executed.

Assembly            Address         RIP               RAX             RBX
mov  rax,10         0x100000        0x100000          10              uninitialised
mov  rbx,20         0x100007        0x100007          10              20
add  rax,rbx        0x10000E        0x10000E          30              20

By this, you should be able to work out what the MOV and ADD instructions do:

  • mov <a>,<b>: Stores the value of <b> in <a>.
  • add <a>,<b>: Adds the value of <b> to <a>.

And thus, a simple computation is performed by the processor.