I've got most of the bits of the IADD3 instruction figured out, there's still a couple things I don't know and can't find an answer to.
here's some output from cuobjdump (reformatted to take less horizontal space)
IADD3 R4, P0, R4, R4, RZ ; // 0x0000000404047210 // 0x003fde0007f1e0ff IADD3.X R5, P3, P6, R5, R5, RZ, P0, P5 ?PM3; // 0x0000000505057210 // 0x003fdec00066a4ff // I edited the bits of the IADD3.X instruction to see how the assembly changes // I don't know if the ?PM3 would actually do anything or appear in normal code // also I don't know if the second predicate input/output are actually used // a more detailed breakdown of the above is below in the commented dump from DocumentSASS
here's a helpful graphic to reference for the following from this pdf:
here's a dump from
sm_86_instructions.txt
generated by DocumentSASS with some comments added by me:
OPCODES IADD3int_pipe = 0b1000010000; // dunno what int_pipe means IADD3 = 0b1000010000; // this is the 0x210 in Opcode saying it's IADD3 ENCODING !iadd3_noimm__RRR_RRR_unused; // dunno what this is // for the following, the first number is the number of bits // the rest of the numbers are pairs of bit indices // for an example, see the comments on the Opcode field below // also note that these bit indices differ from the above graphic, but if you flip the top and bottom labels in the graphic then they match up BITS_3_14_12_Pg = Pg; // Predicate Guard - if the predicate at the index specified by this field is false, don't run this instruction (7 is a hardcoded True predicate, 0 is a hardcoded False predicate - similar to how 255 is a hardcoded 0 register) BITS_1_15_15_Pg_not = Pg@not; // negate the predicate specified by Predicate Guard BITS_13_91_91_11_0_opcode=Opcode; // 13 total bits - 12 are at 0-11, a 13th is bit 91 BITS_1_74_74_Sc_absolute=0; // this bit determines if it's a IADD3.X (aka extended IADD3?) which means use the carry in predicate(s) BITS_8_23_16_Rd=Rd; // destination register BITS_3_83_81_Pu=Pu; // index of a predicate to write the carry out to BITS_3_86_84_cop=Pv; // index of another predicate to write the carry out to? - still need to verify this theory BITS_8_31_24_Ra=Ra; // first source register BITS_1_72_72_e=Ra@negate; // negate the first source register if this bit is 1, also if Sc_absolute is 1 then this turns into a bitwise NOT instead of negate for whatever reason (the same is true for the other source register negate bits) BITS_8_39_32_Rb=Rb; // second source register BITS_1_63_63_Sc_negate=Rb@negate; // negate second source register if this bit is 1 BITS_8_71_64_Rc=Rc; // third source register (I'm guessing the reason it's called IADD3 is because there's 3 source registers) BITS_1_75_75_Sc_negate=Rc@negate; // negate third source register if this bit is 1 BITS_3_89_87_Pp =* 7; // index of a predicate to read a carry in from BITS_1_90_90_input_reg_sz_32_dist =*1; // negate the Pp predicate if this bit is 1 BITS_3_79_77_Pq =* 7; // index of a another predicate to read a carry in from? - still need to verify this theory BITS_1_80_80_ftz =*1; // negate the Pq predicate if this bit is 1 BITS_6_121_116_req_bit_set=req_bit_set; // barrier mask BITS_3_115_113_src_rel_sb=*7; // read barrier BITS_3_112_110_dst_wr_sb=*7; // write barrier BITS_2_103_102_pm_pred=pm_pred; // don't know what this is - setting it to a value other than 0 causes cuobjdump to put a ?PM[value] at the end of the assembly - would like to find out what this actually means BITS_8_124_122_109_105_opex=TABLES_opex_3(batch_t,usched_info,reuse_src_a,reuse_src_b,reuse_src_c); // this seems to merge the reuse, yield and stall bits and uses a lookup table for something
also here's a more in depth explanation of what the control bits mean