handmade.network » Wiki » x86_64 Assembly

Introduction to x86_64 programming

The steps in the tutorial have been tested on a 64-bit Arch Linux installation. This tutorial will contain many simplifications to make it easier to understand. This tutorial expects you to have a good understanding of C, and a basic understanding of process architecture.

A more gentle introduction can be found here: Tutorial/x64 Assembly

Setting up

First, we need to install an assembler. The decision of which assembler is quite important as there are two main syntaxes of x86(_64) assembly.

For example, to move the value in register rbx to register rax:

  • Intel: mov rax, rbx
  • AT&T: movq %rbx, %rax

I personally prefer Intel syntax, so that's what this tutorial will use.

Hence, the assembler we will use shall be NASM as it:

  • Uses Intel syntax
  • Free (BSD license)
  • Has powerful preprocessing capabilities

To install it on Arch Linux, type

1
pacman -S nasm

We will also need gcc and binutils to do linking for us, so type

1
pacman -S gcc binutils

Hello, world!

To test the assembler, first make the following file called hello.s.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
bits 64
section .text
global main
extern puts

main:
    mov    rdi,string
    mov    rax,puts
    call   rax
    xor    rax,rax
    ret

string:
    db    "hello",0

You can then assemble, link and run with:

1
2
3
4
nasm -felf64 hello.s -o hello.o
gcc hello.o -o hello
chmod +x hello
./hello

And you should see something like:

[[email protected] ~]$ ./hello
hello

How it works

Let's go through the hello world program line by line to see what's happening.

bits 64

This is a directive to the assembler to tell it that our program is 64-bits, i.e. uses x8664. Therefore it will output x8664 machine code when it assembles our program.

section .text

A program is split into different sections where different data can be stored. The .text section is used to store machine code. We'll also use it to store the hello world string.

global main

Assembly has functions just like any C program! This directive tells the assembler to make the main function we define global. This means when are program is run, the OS will be able to find where to start.

extern puts

To print a message to the screen, we need the C standard library function puts. This directive tell the assembler that we won't define the function puts in our program - but rather it needs to get it from a library.

main:

This is a label. It assigns a name ("main") to an address in memory. In this case, it will be where the machine code following the label will be found in memory when the program is run. This means it can act like a function - we have named the starting address of a block of machine code.

mov rdi,string

This is the first instruction in our program. All previous lines have contained directives - things that tell the assembler what to do. But an assembly instruction will get directly translated to a machine code instruction.

We're about to call the puts function to output a string. But the puts function takes an argument - a string. Arguments are passed via registers in x86_64. The first argument to a function is passed in the rdi register. Therefore we need to put the address of the string into the rdi register.

The mov instruction will move a value from one place to another, e.g. from memory to a register. In this case we move a constant value (the address of the string) into a register (rdi). Note that the destination comes first, followed by the source. rdi and string are the operands to the instruction, and are separated by a comma. mov is the instruction itself. It's sometimes called a mnemonic, as it's short for the full name of the instruction ("move").

string is another label. We define it later, just before we put the bytes of the string into the program.

mov rax,puts

Here we get the address of the function puts and store it in the register rax. We'll use this address to call the function.

call rax

This calls the function at the address in rax - i.e. the puts function. It will take its arguments from the rdi register, where we placed the address of the string.

xor rax,rax

The return value of a function is placed in the rax register before it exits. This instruction will calculate set the value of rax to 0, by xoring it with itself.

ret

This instruction will return us from the main function.

string:

This label comes before the string we want to print out.

db "hello",0

db tells the assembler to output bytes directly. So it will output the bytes of the string literal, and then the zero byte to terminate the string.

Generated machine code

The machine code that is generated from our assembly file looks like this:

48 BF 00 00 00 00 00 00 00 00           mov    rdi,0
48 B8 00 00 00 00 00 00 00 00           mov    rax,0
FF D0                                   call   rax
48 31 C0                                xor    rax,rax
C3                                      ret
68 65 6C 6C 6F 00                       "hello"

Wait a minute! What happened to the first two instructions? Why are the values set to 0?

This is because the executable may be placed anywhere in memory when it is run. The operating system will update these instructions to reflect where the addresses actually are in memory at runtime.

Registers

See x86 Registers for more information.