I have a pet project with support for DLL based plugins. I'm currently adding support for managed plugins through .NET and I'm in the process of sanity checking the performance of the interop for these plugins. I'm currently trying to understand two things: 1) why I'm seeing a significant performance difference between x86 and x64, and 2) a quantitative performance ceiling for interop in general.
For issue 1, when I call an empty managed function from C++ 100,000,000 times in a loop from an x86 release build it averages out to 8ns per call. That appears reasonable on the surface. On my 3.5 GHz machine that's about 28 cycles. Microsoft's P/Invoke documentation says it takes about 30 x86 instructions for managed to unmanaged interop. C# to C++ interop in x86 Unity3D builds takes 8.2ns. And I've seen Hans Passant mention 7ns as the usual interop cost on Stack Overflow. So I seem to be in the right ballpark.
However, it gets weird when I build for x64 instead. That number jumps to 17ns. I haven't been able to find any information talking about interop differences between x86 and x64. Naively testing x64 interop in Unity3D is showing about 5.9ns. I'm testing an empty function with no parameters or return value so presumably there's no marshaling or slowdown from processing wider data. Playing with calling convention and optimization settings aren't making a significant difference.
I notice simply calling from native C++ into the native side of a C++/CLI DLL is 1.3ns in both x86 and x64. But calling directly into managed, or calling into unmanaged then into managed both show the doubling in cost.
I've asked this on
Stack Overflow and included the code I'm testing with, though I'm not expecting a concrete answer there.
Any ideas why interop takes twice as long on x64 compared to x86?
For question 2, is 8ns about right for interop? Or are there fairly straightforward things I can be doing to get that lower?