I've got most of the bits of the IADD3 instruction figured out, there's still a couple things I don't know and can't find an answer to.
here's some output from cuobjdump (reformatted to take less horizontal space)
IADD3 R4, P0, R4, R4, RZ ; // 0x0000000404047210
// 0x003fde0007f1e0ff
IADD3.X R5, P3, P6, R5, R5, RZ, P0, P5 ?PM3; // 0x0000000505057210
// 0x003fdec00066a4ff
// I edited the bits of the IADD3.X instruction to see how the assembly changes
// I don't know if the ?PM3 would actually do anything or appear in normal code
// also I don't know if the second predicate input/output are actually used
// a more detailed breakdown of the above is below in the commented dump from DocumentSASS
here's a helpful graphic to reference for the following from this pdf:
here's a dump from sm_86_instructions.txt
generated by DocumentSASS with some comments added by me:
OPCODES
IADD3int_pipe = 0b1000010000; // dunno what int_pipe means
IADD3 = 0b1000010000; // this is the 0x210 in Opcode saying it's IADD3
ENCODING
!iadd3_noimm__RRR_RRR_unused; // dunno what this is
// for the following, the first number is the number of bits
// the rest of the numbers are pairs of bit indices
// for an example, see the comments on the Opcode field below
// also note that these bit indices differ from the above graphic, but if you flip the top and bottom labels in the graphic then they match up
BITS_3_14_12_Pg = Pg; // Predicate Guard - if the predicate at the index specified by this field is false, don't run this instruction (7 is a hardcoded True predicate, 0 is a hardcoded False predicate - similar to how 255 is a hardcoded 0 register)
BITS_1_15_15_Pg_not = Pg@not; // negate the predicate specified by Predicate Guard
BITS_13_91_91_11_0_opcode=Opcode; // 13 total bits - 12 are at 0-11, a 13th is bit 91
BITS_1_74_74_Sc_absolute=0; // this bit determines if it's a IADD3.X (aka extended IADD3?) which means use the carry in predicate(s)
BITS_8_23_16_Rd=Rd; // destination register
BITS_3_83_81_Pu=Pu; // index of a predicate to write the carry out to
BITS_3_86_84_cop=Pv; // index of another predicate to write the carry out to? - still need to verify this theory
BITS_8_31_24_Ra=Ra; // first source register
BITS_1_72_72_e=Ra@negate; // negate the first source register if this bit is 1, also if Sc_absolute is 1 then this turns into a bitwise NOT instead of negate for whatever reason (the same is true for the other source register negate bits)
BITS_8_39_32_Rb=Rb; // second source register
BITS_1_63_63_Sc_negate=Rb@negate; // negate second source register if this bit is 1
BITS_8_71_64_Rc=Rc; // third source register (I'm guessing the reason it's called IADD3 is because there's 3 source registers)
BITS_1_75_75_Sc_negate=Rc@negate; // negate third source register if this bit is 1
BITS_3_89_87_Pp =* 7; // index of a predicate to read a carry in from
BITS_1_90_90_input_reg_sz_32_dist =*1; // negate the Pp predicate if this bit is 1
BITS_3_79_77_Pq =* 7; // index of a another predicate to read a carry in from? - still need to verify this theory
BITS_1_80_80_ftz =*1; // negate the Pq predicate if this bit is 1
BITS_6_121_116_req_bit_set=req_bit_set; // barrier mask
BITS_3_115_113_src_rel_sb=*7; // read barrier
BITS_3_112_110_dst_wr_sb=*7; // write barrier
BITS_2_103_102_pm_pred=pm_pred; // don't know what this is - setting it to a value other than 0 causes cuobjdump to put a ?PM[value] at the end of the assembly - would like to find out what this actually means
BITS_8_124_122_109_105_opex=TABLES_opex_3(batch_t,usched_info,reuse_src_a,reuse_src_b,reuse_src_c); // this seems to merge the reuse, yield and stall bits and uses a lookup table for something
also here's a more in depth explanation of what the control bits mean