Projects Jams Discord News
Resources
Unwind Fishbowls Forums
About
Manifesto Our values About
Log In

Wrap Up / Conclusion

iixnoxidexii June 15, 2025

My main goal for this project was to understand my GPU's machine code (SASS) better and I accomplished that. I did want to have more of a functioning disassembler and editor for the SASS code - right now I'm just disassembling the IADD3 instruction. I'll still be working on that viewer/editor so if anyone's interested in that stay tuned. Here's a screenshot of the current state of the viewer/editor: Screenshot 2025-06-15 143418.png The numbers next to the IADD3 instructions are in order:

  • destination register
  • 1st source register (optionally negated/inverted)
  • 2nd source register (optionally negated/inverted)
  • 3rd source register (optionally negated/inverted)
  • first carry out predicate
  • first carry in predicate (optionally negated)
  • second carry out predicate
  • second carry in predicate (optionally negated)

all of these use negated True (!P7) for the secon

Read more

IADD3 Binary Encoding Breakdown

iixnoxidexii June 15, 2025

I've got most of the bits of the IADD3 instruction figured out, there's still a couple things I don't know and can't find an answer to.

here's some output from cuobjdump (reformatted to take less horizontal space)

IADD3 R4, P0, R4, R4, RZ ;                    // 0x0000000404047210
                                              // 0x003fde0007f1e0ff
IADD3.X R5, P3, P6, R5, R5, RZ, P0, P5  ?PM3; // 0x0000000505057210
                                              // 0x003fdec00066a4ff
// I edited the bits of the IADD3.X instruction to see how the assembly changes
// I don't know if the ?PM3 would actually do anything or appear in normal code
// also I don't know if the second predicate input/output are actually used
// a more detailed breakdown of the above is below in the commented dump from DocumentSASS

here's a helpful graphic to reference for the following from this pdf: ![ampere_encoding.png](https://asset

Read more

Some other useful stuff and some guesses on what the default control bits are and why

iixnoxidexii June 14, 2025

the default control bits output by ptxas on some random ptx I made seem to be 0x003fde (why I think this is default is in the table below):

| field               | #bits | val | why I think this is default |
| --------------------------------------------------------------- |
| reuse               |   4   |  0  | can still reuse regs, we just dont give the hint to the processor that we ARE reusing it |
| wait barrier mask   |   6   |  3  | probably could also be 0 since the barriers for all prev instructions are 7 aka unset) |
| read barrier index  |   3   |  7  | unset/not making a barrier  |
| write barrier index |   3   |  7  | unset/not making a barrier  |
| yield flag          |   1   |  0  | dont yield (in combination with a high stall this seems to kinda say "take as long as it takes")
| stall cycles        |   4   |  F  | take as long as it takes    |

most helpful graphic I've found for understanding the encoding from [this pdf](https://www.cse.ust.hk/~weiwa

Read more

Link Dump

iixnoxidexii June 14, 2025

First post for this project - I'll just dump some links that I've found helpful for learning about the Ampere architecture and instruction encoding

  • website with SM90a ISA
  • SASS control code viewer
  • SASS control code explanation
  • Turning/Ampere assembler
  • Volta architecture pdf
  • Ampere architecture pdf
  • Ampere architecture talk
  • [above talk slides pdf (no signin required) ]( https://cfvod.kaltura.com/api_v3/index.php/service/attachment_attachmentAsset/action/serve/attachmentAssetId/1_ezjj6bp9/ks/djJ8MjkzNTc3MXzsRYu5cyOvZOPrEK0zuiSWMgkhZRcJ8ZojnzQVpbbToju32
Read more