handmade.network » Forums » Pure C functions vs Class/Struct Member functions
khofez
9 posts
#10248 Pure C functions vs Class/Struct Member functions
7 months, 2 weeks ago

I was wondering if there are any differences in terms of performance between pure C functions:

1
2
3
void myFunc()
{
}


and Class/Struct Member functions

1
2
3
void myClass::myFunc()
{
}


mmozeiko
Mārtiņš Možeiko
1437 posts
1 project
#10249 Pure C functions vs Class/Struct Member functions
7 months, 2 weeks ago Edited by Mārtiņš Možeiko on Jan. 6, 2017, 11:20 p.m.

There are absolutely no performance differences between non-virtual class functions and global functions. The code to call them is identical. See here: https://godbolt.org/g/DObEWW

Virtual functions may cost you some performance (depending on situation where they are used and architecture you are running code on).
khofez
9 posts
#10251 Pure C functions vs Class/Struct Member functions
7 months, 2 weeks ago

Thanks!
Jesse
34 posts
#10255 Pure C functions vs Class/Struct Member functions
7 months, 2 weeks ago Edited by Jesse on Jan. 7, 2017, 5:33 p.m.

Here is an example of a simple C(++) byte stream parser (for a subset of the OpenGEX format) showcasing how the compiler will generate different instructions based on whether your functions are members of classes / structs or not, using Clang. EDIT: Actually not, read mmozeiko's response. Comparing implementations is hard!

The first link has functions as members of structs (C++), and the other is more C-like, following Casey's methodology. It's a bit difficult to compare the output differences, admittedly.

I was surprised the C version's output is 12.5% smaller. At -03, the C style generates 40% fewer instructions! WAT!

Simple performance comparison of 100000 -02 runs show the C version finish in 8.2 seconds while the C++ ran in 8.8 seconds on my MacBook Air. Real world example FTW!

This goes to show, at least in passing, making the compiler's job as simple as possible has its payoffs.

https://godbolt.org/#z:OYLghAFBqd...wEKwas68a6K3ahK/a60iueiEAUKoAA%3D

https://godbolt.org/#z:OYLghAFBqd...1fVQVK14Va1ZpIc9EIAPlQAA%3D%3D%3D
mmozeiko
Mārtiņš Možeiko
1437 posts
1 project
#10257 Pure C functions vs Class/Struct Member functions
7 months, 2 weeks ago Edited by Mārtiņš Možeiko on Jan. 7, 2017, 11:29 a.m.

You are comparing a bit different implementations. This is not a fair comparison.

For example in "C" version you are passing At pointer by address, and then dereference it and assign to local variable. Compiler now knows that At pointer cannot change - its in local variable which cannot be changed by anybody else outside of function.

In "C++" version you are using this->At all the time. Compiler doesn't know that "this" object is not shared with another thread. So it must reload At pointer every time it is accessed. Obviously it will be slower.

You need to assign in to local variable same as in "C" version (or change "C" version to do double dereference everywhere) if you want fair comparison.

This is very easy to see here:
fragment from "C" version:
1
2
3
4
5
6
7
.LBB1_23:                               # =>This Inner Loop Header: Depth=1
        movzx   r10d, byte ptr [rsi + 2]
        inc     rsi
        mov     eax, r10d
        add     al, -48
        cmp     al, 10
        jb      .LBB1_23

rsi is used as a pointer.

fragment from "C++" version:
1
2
3
4
5
6
7
8
.LBB1_2:                                # =>This Inner Loop Header: Depth=1
        mov     qword ptr [rdi + 8], rdx
        movzx   ecx, byte ptr [rdx]
        mov     eax, ecx
        add     al, -48
        dec     rdx
        cmp     al, 10
        jae     .LBB1_2

See how rdx pointer is reloaded from memory on every dereference inside inner loop!

As you see, this has nothing to do of how you are you declaring functions - member or not. This is only about how you write code that deals with pointers. And you wrote it differently in these two versions.
Jesse
34 posts
#10264 Pure C functions vs Class/Struct Member functions
7 months, 2 weeks ago Edited by Jesse on Jan. 7, 2017, 7:30 p.m.

Thanks mmozeiko! Thank you for the correction and explanation.

I made the simple change of moving the u8 *LocalAt to local variables in the respective functions, and the performance is now indiscernible between the two. Good demonstration of memory access performance.

This goes to show being more readily exposed to the pointers makes it a little easier to reason about.