X-Ray Jam. June 9-15, 2025. See the results.

My main goal for this project was to understand my GPU's machine code (SASS) better and I accomplished that. I did want to have more of a functioning disassembler and editor for the SASS code - right now I'm just disassembling the IADD3 instruction. I'll still be working on that viewer/editor so if anyone's interested in that stay tuned. Here's a screenshot of the current state of the viewer/editor: Screenshot 2025-06-15 143418.png The numbers next to the IADD3 instructions are in order:

  • destination register
  • 1st source register (optionally negated/inverted)
  • 2nd source register (optionally negated/inverted)
  • 3rd source register (optionally negated/inverted)
  • first carry out predicate
  • first carry in predicate (optionally negated)
  • second carry out predicate
  • second carry in predicate (optionally negated)

all of these use negated True (!P7) for the second carry in, but P0 aka False aka 0 should work as well - still have to test to make sure that what's going on here is 2 carry ins and outs

Read More →

I've got most of the bits of the IADD3 instruction figured out, there's still a couple things I don't know and can't find an answer to.

here's some output from cuobjdump (reformatted to take less horizontal space)

IADD3 R4, P0, R4, R4, RZ ;                    // 0x0000000404047210
                                              // 0x003fde0007f1e0ff
IADD3.X R5, P3, P6, R5, R5, RZ, P0, P5  ?PM3; // 0x0000000505057210
                                              // 0x003fdec00066a4ff
// I edited the bits of the IADD3.X instruction to see how the assembly changes
// I don't know if the ?PM3 would actually do anything or appear in normal code
// also I don't know if the second predicate input/output are actually used
// a more detailed breakdown of the above is below in the commented dump from DocumentSASS

here's a helpful graphic to reference for the following from this pdf: ampere_encoding.png here's a dump from sm_86_instructions.txt generated by DocumentSASS with some comments added by me:

OPCODES
        IADD3int_pipe =  0b1000010000; // dunno what int_pipe means
        IADD3 =  0b1000010000; // this is the 0x210 in Opcode saying it's IADD3
ENCODING
!iadd3_noimm__RRR_RRR_unused; // dunno what this is
// for the following, the first number is the number of bits
// the rest of the numbers are pairs of bit indices
// for an example, see the comments on the Opcode field below
// also note that these bit indices differ from the above graphic, but if you flip the top and bottom labels in the graphic then they match up
BITS_3_14_12_Pg = Pg;             // Predicate Guard - if the predicate at the index specified by this field is false, don't run this instruction (7 is a hardcoded True predicate, 0 is a hardcoded False predicate - similar to how 255 is a hardcoded 0 register)
BITS_1_15_15_Pg_not = Pg@not;     // negate the predicate specified by Predicate Guard
BITS_13_91_91_11_0_opcode=Opcode; // 13 total bits - 12 are at 0-11, a 13th is bit 91
BITS_1_74_74_Sc_absolute=0;       // this bit determines if it's a IADD3.X (aka extended IADD3?) which means use the carry in predicate(s)
BITS_8_23_16_Rd=Rd;               // destination register
BITS_3_83_81_Pu=Pu;               // index of a predicate to write the carry out to
BITS_3_86_84_cop=Pv;              // index of another predicate to write the carry out to? - still need to verify this theory
BITS_8_31_24_Ra=Ra;               // first source register
BITS_1_72_72_e=Ra@negate;         // negate the first source register if this bit is 1, also if Sc_absolute is 1 then this turns into a bitwise NOT instead of negate for whatever reason (the same is true for the other source register negate bits)
BITS_8_39_32_Rb=Rb;               // second source register
BITS_1_63_63_Sc_negate=Rb@negate; // negate second source register if this bit is 1
BITS_8_71_64_Rc=Rc;               // third source register (I'm guessing the reason it's called IADD3 is because there's 3 source registers)
BITS_1_75_75_Sc_negate=Rc@negate; // negate third source register if this bit is 1
BITS_3_89_87_Pp =* 7;             // index of a predicate to read a carry in from
BITS_1_90_90_input_reg_sz_32_dist =*1; // negate the Pp predicate if this bit is 1
BITS_3_79_77_Pq =* 7;             // index of a another predicate to read a carry in from? - still need to verify this theory
BITS_1_80_80_ftz =*1;             // negate the Pq predicate if this bit is 1
BITS_6_121_116_req_bit_set=req_bit_set; // barrier mask
BITS_3_115_113_src_rel_sb=*7;           // read barrier
BITS_3_112_110_dst_wr_sb=*7;            // write barrier
BITS_2_103_102_pm_pred=pm_pred;         // don't know what this is - setting it to a value other than 0 causes cuobjdump to put a ?PM[value] at the end of the assembly - would like to find out what this actually means
BITS_8_124_122_109_105_opex=TABLES_opex_3(batch_t,usched_info,reuse_src_a,reuse_src_b,reuse_src_c); // this seems to merge the reuse, yield and stall bits and uses a lookup table for something

also here's a more in depth explanation of what the control bits mean

Read More →

the default control bits output by ptxas on some random ptx I made seem to be 0x003fde (why I think this is default is in the table below):

| field               | #bits | val | why I think this is default |
| --------------------------------------------------------------- |
| reuse               |   4   |  0  | can still reuse regs, we just dont give the hint to the processor that we ARE reusing it |
| wait barrier mask   |   6   |  3  | probably could also be 0 since the barriers for all prev instructions are 7 aka unset) |
| read barrier index  |   3   |  7  | unset/not making a barrier  |
| write barrier index |   3   |  7  | unset/not making a barrier  |
| yield flag          |   1   |  0  | dont yield (in combination with a high stall this seems to kinda say "take as long as it takes")
| stall cycles        |   4   |  F  | take as long as it takes    |

most helpful graphic I've found for understanding the encoding from this pdf ampere_encoding.png

Read More →