Introduction to x86_64 programming
The steps in the tutorial have been tested on a 64-bit Arch Linux installation. This tutorial will contain many simplifications to make it easier to understand. This tutorial expects you to have a good understanding of C, and a basic understanding of process architecture.
A more gentle introduction can be found here: Tutorial/x64 Assembly
Setting up
First, we need to install an assembler. The decision of which assembler is quite important as there are two main syntaxes of x86(_64) assembly.
For example, to move the value in register rbx
to register rax
:
- Intel:
mov rax, rbx
- AT&T:
movq %rbx, %rax
I personally prefer Intel syntax, so that's what this tutorial will use.
Hence, the assembler we will use shall be NASM as it:
- Uses Intel syntax
- Free (BSD license)
- Has powerful preprocessing capabilities
To install it on Arch Linux, type
1 | pacman -S nasm |
We will also need gcc and binutils to do linking for us, so type
1 | pacman -S gcc binutils |
Hello, world!
To test the assembler, first make the following file called hello.s
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | bits 64 section .text global main extern puts main: mov rdi,string mov rax,puts call rax xor rax,rax ret string: db "hello",0 |
You can then assemble, link and run with:
1 2 3 4 | nasm -felf64 hello.s -o hello.o gcc hello.o -o hello chmod +x hello ./hello |
And you should see something like:
[[email protected] ~]$ ./hello hello
How it works
Let's go through the hello world program line by line to see what's happening.
bits 64
This is a directive to the assembler to tell it that our program is 64-bits, i.e. uses x8664. Therefore it will output x8664 machine code when it assembles our program.
section .text
A program is split into different sections where different data can be stored. The .text
section is used to store machine code. We'll also use it to store the hello world string.
global main
Assembly has functions just like any C program! This directive tells the assembler to make the main
function we define global. This means when are program is run, the OS will be able to find where to start.
extern puts
To print a message to the screen, we need the C standard library function puts
. This directive tell the assembler that we won't define the function puts
in our program - but rather it needs to get it from a library.
main:
This is a label. It assigns a name ("main") to an address in memory. In this case, it will be where the machine code following the label will be found in memory when the program is run. This means it can act like a function - we have named the starting address of a block of machine code.
mov rdi,string
This is the first instruction in our program. All previous lines have contained directives - things that tell the assembler what to do. But an assembly instruction will get directly translated to a machine code instruction.
We're about to call the puts
function to output a string. But the puts
function takes an argument - a string. Arguments are passed via registers in x86_64. The first argument to a function is passed in the rdi
register. Therefore we need to put the address of the string into the rdi
register.
The mov
instruction will move a value from one place to another, e.g. from memory to a register. In this case we move a constant value (the address of the string) into a register (rdi
). Note that the destination comes first, followed by the source. rdi
and string
are the operands to the instruction, and are separated by a comma. mov
is the instruction itself. It's sometimes called a mnemonic, as it's short for the full name of the instruction ("move").
string
is another label. We define it later, just before we put the bytes of the string into the program.
mov rax,puts
Here we get the address of the function puts
and store it in the register rax
. We'll use this address to call the function.
call rax
This calls the function at the address in rax
- i.e. the puts
function. It will take its arguments from the rdi
register, where we placed the address of the string.
xor rax,rax
The return value of a function is placed in the rax
register before it exits. This instruction will calculate set the value of rax
to 0, by xoring it with itself.
ret
This instruction will return us from the main function.
string:
This label comes before the string we want to print out.
db "hello",0
db
tells the assembler to output bytes directly. So it will output the bytes of the string literal, and then the zero byte to terminate the string.
Generated machine code
The machine code that is generated from our assembly file looks like this:
48 BF 00 00 00 00 00 00 00 00 mov rdi,0 48 B8 00 00 00 00 00 00 00 00 mov rax,0 FF D0 call rax 48 31 C0 xor rax,rax C3 ret 68 65 6C 6C 6F 00 "hello"
Wait a minute! What happened to the first two instructions? Why are the values set to 0?
This is because the executable may be placed anywhere in memory when it is run. The operating system will update these instructions to reflect where the addresses actually are in memory at runtime.
Registers
See x86 Registers for more information.