Pure C functions vs Class/Struct Member functions

khofez

#10248

January 6, 2017

I was wondering if there are any differences in terms of performance between pure C functions:

1
2
3

void myFunc()
{
}

and Class/Struct Member functions

1
2
3

void myClass::myFunc()
{
}

Mārtiņš Možeiko

#10249

January 6, 2017

There are absolutely no performance differences between non-virtual class functions and global functions. The code to call them is identical. See here: https://godbolt.org/g/DObEWW

Virtual functions may cost you some performance (depending on situation where they are used and architecture you are running code on).

Edited by Mārtiņš Možeiko on January 6, 2017, 11:20pm

khofez

#10251

January 7, 2017

Thanks!

Jesse

#10255

January 7, 2017

Here is an example of a simple C(++) byte stream parser (for a subset of the OpenGEX format) showcasing how the compiler will generate different instructions based on whether your functions are members of classes / structs or not, using Clang. EDIT: Actually not, read mmozeiko's response. Comparing implementations is hard!

The first link has functions as members of structs (C++), and the other is more C-like, following Casey's methodology. It's a bit difficult to compare the output differences, admittedly.

I was surprised the C version's output is 12.5% smaller. At -03, the C style generates 40% fewer instructions! WAT!

Simple performance comparison of 100000 -02 runs show the C version finish in 8.2 seconds while the C++ ran in 8.8 seconds on my MacBook Air. Real world example FTW!

This goes to show, at least in passing, making the compiler's job as simple as possible has its payoffs.

https://godbolt.org/#z:OYLghAFBqd...wEKwas68a6K3ahK/a60iueiEAUKoAA%3D

https://godbolt.org/#z:OYLghAFBqd...1fVQVK14Va1ZpIc9EIAPlQAA%3D%3D%3D

Edited by Jesse on January 7, 2017, 5:33pm

Mārtiņš Možeiko

#10257

January 7, 2017

You are comparing a bit different implementations. This is not a fair comparison.

For example in "C" version you are passing At pointer by address, and then dereference it and assign to local variable. Compiler now knows that At pointer cannot change - its in local variable which cannot be changed by anybody else outside of function.

In "C++" version you are using this->At all the time. Compiler doesn't know that "this" object is not shared with another thread. So it must reload At pointer every time it is accessed. Obviously it will be slower.

You need to assign in to local variable same as in "C" version (or change "C" version to do double dereference everywhere) if you want fair comparison.

This is very easy to see here:
fragment from "C" version:

.LBB1_23:                               # =>This Inner Loop Header: Depth=1
        movzx   r10d, byte ptr [rsi + 2]
        inc     rsi
        mov     eax, r10d
        add     al, -48
        cmp     al, 10
        jb      .LBB1_23

rsi is used as a pointer.

fragment from "C++" version:

.LBB1_2:                                # =>This Inner Loop Header: Depth=1
        mov     qword ptr [rdi + 8], rdx
        movzx   ecx, byte ptr [rdx]
        mov     eax, ecx
        add     al, -48
        dec     rdx
        cmp     al, 10
        jae     .LBB1_2

See how rdx pointer is reloaded from memory on every dereference inside inner loop!

As you see, this has nothing to do of how you are you declaring functions - member or not. This is only about how you write code that deals with pointers. And you wrote it differently in these two versions.

Edited by Mārtiņš Možeiko on January 7, 2017, 11:29am

Jesse

#10264

January 7, 2017

Thanks mmozeiko! Thank you for the correction and explanation.

I made the simple change of moving the u8 *LocalAt to local variables in the respective functions, and the performance is now indiscernible between the two. Good demonstration of memory access performance.

This goes to show being more readily exposed to the pointers makes it a little easier to reason about.

Edited by Jesse on January 7, 2017, 7:30pm