General Purpose Registers
The Main Ones (and their history)
The 8086 introduced 4 main registers, each of which held 16 bits.
Each of these are aliased with 2 8-bit registers to quickly access the high (-H) and low (-L) byte.
It's generally a bad idea to use -L and -H registers to store separate 8-bit values on modern processors, due to how register renaming works.
When Intel expanded their processors to work with 32-bit numbers, these registers were "extended" (hence the E- prefix) to 32 bits.
Again, when AMD created AMD64 processors, these registers were extended to hold 64-bit numbers.
Here is an overview for what the aliasing looks like for a single register.
------------------------------------------------- | RAX (64-bit) | ------------------------------------------------- | | EAX (32-bit) | ------------------------------------------------- | | | AX (16) | ------------------------------------------------- | | | AH | AL | -------------------------------------------------
For example the following code:
1 2 3
mov eax,0xABCDEF12 mov ax,0x1234 mov al,0xAB
Will result in the value of the register EAX to be
HOWEVER, there is an important exception to these rules. The top 32-bits of the R-X registers will be cleared to zero whenever the lower 32-bits are written to.
For example the following code:
mov rax,0x1111111111111111 mov eax,0x22222222
Will result in the value of the register RAX to be
Because of the severe lack of general purpose registers on x86, AMD introduced 8 new general purpose registers in AMD64.
These are all 64-bit.
You can access the lower 32 bits with the -D suffix (e.g. R8D), the lower 16 bits with the -W suffix (e.g. R9W), and the lower 8 bits with the -B suffix (e.g. R10B).
Additional "General Purpose" Registers
Although many of the registers were given special purposes on the original 8086, in 32-bit and 64-bit modes the modern processors allow much more versatility with more of the registers.
- SI (source index)
- DI (destination index)
- BP (base pointer)
You'll actually still find the SI and DI registers hardwired to some of the string instructions still available on modern processors, but these registers are essentially the same now as the other general purpose registers.
They too have larger versions:
- ESI, EDI, EBP (32-bit)
- RSI, RDI, RBP (64-bit)
And on AMD64, it's also possible to access the least significant byte.
- SIL, DIL, BPL (8-bit)
There are 6 segment registers, all of which are 16 bits.
Their function depends on the processor mode.
Real mode (16-bit)
Intel wanted the 8086, a 16-bit CPU, to be able to address a megabyte of memory. This caused a bit of a problem, as 16-bit CPUs can only normally access on 64KiB of memory. Therefore Intel decided that the programmer must pick a "segment register" to accompany every access to memory. (Although instruction reads are locked to CS and stack operations are locked to SS).
The actual address of a memory operation could the be computed using the following formula:
((segment << 4) + offset) & 0xFFFFF
This can be a tad confusing because of the misnomery nature of "segment", but alas, it's not really that complicated.
Here's an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
mov ax,0x1000 mov ds,ax mov bx,0xABCD mov [ds:bx],10 ; Writes to address 0x1ABCD mov ax,0x1 mov ds,ax mov bx,0x10 mov [ds:bx],20 ; Writes to address 0x20 mov ax,0xFFFF mov ds,ax mov bx,0x10 mov [ds:bx],30 ; Writes to address 0
In the last example, we can see an unforunate wrap-around property of this system, that forced Intel to have to create a workaround when they wanted to address more than a megabyte of memory, known as the A20 line. When the A20 line is disabled, the 20th bit is always masked so that only even megabytes of memory are accessible. When the A20 line is enabled, memory can be accessed as normal.
Protected mode (32-bit)
As described here, there is a model of virtual memory where the address space can be divided up into segments, indices of which can be stored in special "segment" registers that can only be modified by the kernel. Programs can then pick a segment for each memory access, and then permission checks and range validation etc. can be performed by the processor.
Intel, in their infinite wisdom, decided this would the perfect model for their newest processors, so they decided to use this as the main mode of virtual memory. This lead the segment registers to take on new meanings. The segment registers are actually offsets into a table that contains information about each segment.
In the end the segmentation model proved unpopular with programmers, and paging was introduced as a replacement, and for userland programmers, segments were never a worry again.
Long mode (64-bit)
In long mode, segmentation is practically disabled. However, for some reason, the segment registers are still there, and must be valid. But again, for userland programmers their operations has become completely transparent.
SP and IP
The stack on x86 processors is controlled by the use of the stack pointer register, SP.
- SP (16-bit)
- ESP (32-bit)
- RSP (64-bit)
It should always remain at the natural alignment of the processor's mode, i.e. in 16-bit mode it should always be 2-byte aligned, in 32-bit mode it should always be 4-byte aligned, and in 64-bit mode it should always be 8-byte aligned.
Some ABIs also demand further alignment before function invocation so that SIMD variables can be stored on the stack.
The program counter on x86 processors is known more commonly by the term instruction pointer, a term from which the IP register was born.
- IP (16-bit)
- EIP (32-bit)
- RIP (64-bit)
Its value can be set with the RET instruction:
1 2 3
mov eax,0x12345678 push eax ret
And its value can be read with the CALL instruction:
1 2 3 4 5 6 7 8
read_eip: mov eax,[esp] ret ... call read_eip ; EAX contains the value of EIP for the next instruction.
The FLAGS register contains the current flags for the processor.
It can be modified with the PUSHF and POPF instructions, although not all of the bits can be modified in userland code. Some of the flags have dedicated instructions for their modification, such as CLI and STI for the interrupt flag.
They are mostly useful for conditional branches, and large integer arithmetic.
Bit 0 CF Carry flag Set when an operation has its final carry bit set Bit 1 Reserved Bit 2 PF Parity flag Bit 3 Reserved Bit 4 AF Auxillary carry flag Bit 5 Reserved Bit 6 ZF Zero flag Set when an operation produces the number zero Bit 7 SF Sign flag Set when an operation is negative Bit 8 TF Trap flag Bit 9 IF Interrupt enable flag Enables interrupts Bit 10 DF Direction flag Changes direction of string operations Bit 11 OF Overflow flag Set when a subtraction can't be nicely represented Bits 12/13 IOPL Privilege level Bit 14 NT Nested task Bit 15 Reserved Bit 16 RF Resume flag Bit 17 VM 8086 emulation mode Bit 18 AC Alignment check Bit 19 VIF Virtual interrupt flag Bit 20 VIP Virtual interrupt pending Bit 21 ID ID flag Bits 22-63 Reserved
And no, I don't know what most of these do.
The MXCSR register is used to control the operation of SSE.
A full list of bits can be viewed here, but a fairly sane starting value is 0x1FC0, which can be set using the following code:
1 2 3 4
mov rax,.init ldmxcsr [rax] ret .init: dd 0x00001FC0