Perf Gains With Resizeable BAR Support e.g. AMD Smart Access Memory

Blu342

#23796

January 5, 2021

AMD Smart Access Memory claims increased performance of up to 11% in some games. I am a bit new to graphics and am unsure where this increased performance comes from. The way I understand it is that uploading of data like vertex buffers, textures, etc to the GPU is usually done via a DMA staging buffer of some sort, whereas Smart Memory let's us access all of VRAM directly as opposed to 256MB chunks. Is the 11% boost just due to increased streaming speeds e.g. being able to edit part of a buffer resident in VRAM as opposed to resending the entire thing over the PCI connection with a staging buffer? Sorry if this made little sense, I am not the most knowledgeable on GPU architecture.

I guess fundamentally the question is where does the CPU benefit in having more than 256MB of directly addressable VRAM, because atm the only thing i can think of is if you want a dynamic vertex buffer or something along those lines.

Edited by Blu342 on January 5, 2021, 7:39am

Mārtiņš Možeiko

#23799

January 6, 2021

My understanding is that currently Windows maps VRAM only in 256MB chunks. So if you need to access more memory, you need to do extra work (write stuff to registers) to map different part of memory. If you operate with a lot of memory, then you're spending time to adjust this base address. Mapping larger chunks, or even whole memory would eliminate this overhead. Which is called "resizable bar". AMD calls it smart access memory, just a marketing name. Using this requires support from GPU, BIOS, OS and GPU driver.

This overhead is global - to whole system, not one application. That probably allows to spend less time to remapping when your game needs to upload new uniforms for next frame, when in same time Windows switched it away for whatever it needs to do (update framebuffer/compositor/etc). If you don't stream a lot of new data, then you probably won't see much improvement - from what I've googled lot of benchmarks often show single digit % improvement, not 11% which is AMD marketing material number.

It seems that PCI spec have this since ~2008, so not a new technology, just needed to be enabled in GPU/OS/drivers: https://pcisig.com/specifications...ble%20bar&&&&order=title&sort=asc
Here's more info: https://docs.microsoft.com/en-us/...ers/display/resizable-bar-support

And from what it looks like Linux supported this feature already for a while, see: https://www.phoronix.com/forums/f...-rx-6000-series/page4#post1215694

Edited by Mārtiņš Možeiko on January 6, 2021, 1:37am

Blu342

#23800

January 6, 2021

I guess my confusion lies in why you would want to map VRAM directly into the Virtual Memory Address space at all, as opposed to writing into a a staging buffer in System RAM then scheduling some sort of DMA transfer to VRAM asynchronously.

Like say you are updating a uniform in VRAM, I understand that using memory that is visible to both the GPU and CPU would be a direct write from CPU to GPU vs a write to an intermediary buffer then a DMA transfer (which clearly is overall more IO that needs to be done). However, it seems to me that the CPU, since it is doing the write directly into VRAM, has to incur a blocking fixed IO cost due to a write over PCIe v.s. a comparatively much cheaper IO cost to write to a system RAM staging buffer, which can then be asynchronously transferred to the GPU in a non-blocking fashion via DMA? Basically, it seems to me like although more IO is being done with a staging buffer, it is much less costly for the CPU to actually do since it only ever needs to write to System RAM. I guess even aside from Resizeable BAR, I don't really see why you would want to use this memory model over some sort of staging buffer scheme unless you are heavily GPU bound or something.

I probably am missing some aspect of how this all works, so would appreciate any insight / corrections to the above interpretation.

Edited by Blu342 on January 6, 2021, 11:16am