Why is the rust version of this program much faster than the C version? And how to improve the C version?

Italo

#20654

March 6, 2019

So, I have two programs that do the same thing. I tried parallelizing them using pthreads for the C version and using a rust library for the rust one.

Here are the links to both versions:
C version: https://gist.github.com/nyeecola/daf6a1d41c1aa8273403cd7e0459aac9
Rust version: https://gist.github.com/nyeecola/8f8c65c60e79d9866b4f559a515a5659

These are the results that I got from it:

Running on 1 thread:
- C version: 80-90 fps
- Rust version: 110-130 fps

Running on 4 threads:
- C version: 370-420 fps
- Rust version: 1400-1600 fps

What is causing this drastic difference? I guess it may be that I'm not using a thread pool for the C version and the threads keep getting created and joined? I tried implementing a thread pool but I had some trouble, will keep trying though.

Is there any other thing that I'm missing? Any tips or ideas?

Edited by Italo on March 6, 2019, 10:49pm Reason: Initial post

Mārtiņš Možeiko

#20656

March 6, 2019

Yes, that could be the difference. Creating thread is very expensive. You should not do that all the time. Not sure what rust implementation does (I don't know rust), but in C you should create threads only once, at startup. Make them wait on condition or smth. Then prepare your work, notify condition and wait for result - another condition or similar synchronization primitive. Then on next iteration repeat data preparation/signaling/waiting on condition, but no thread creation or destruction.

Also your threads are using one global seed. I would use separate seed for each thread - as local variable. Reading/writing to same global variable a lot is something that can slow down threads due to fact that they need to synchronize caches. It seems in Rust you are using thread local random generator.

Edited by Mārtiņš Možeiko on March 6, 2019, 11:18pm

Miles

#20661

March 7, 2019

For the single-threaded code, it's very likely that the different RNG functions used are going to be the main thing that determines program speed. The optimization level you are compiling each program at could also make a significant difference.

What I'm much more puzzled by is how the rust version managed to increase performance by more than 10x using only 4x as many threads. That measurement seems very suspicious.

ratchetfreak

#20662

March 7, 2019

notnullnotvoid

What I'm much more puzzled by is how the rust version managed to increase performance by more than 10x using only 4x as many threads. That measurement seems very suspicious.

Not using a threadpool for one

when there is such a huge difference there is a very good chance that the 2 programs aren't doing the same thing.

Mārtiņš Možeiko

#20664

March 7, 2019

I increased POINTS_LEN to 10000000 (100x more). Then run both programs with THREAD_NUM = 1 (single thread).

On my laptop C version gives me 5.3 to 5.5 "FPS". Rust version gives me 5.0 to 5.4 "FPS". So pretty much the same time.

This means that for smaller point count thread creation code is the overhead in C code.

Btw, you are really benchmarking rand_r implementation vs rand crate implementation (whatever it uses internally). Not sure what's the point of this. I'm pretty sure you can have much faster random implementation in C which would be also better quality than rand_r.

Also there is a huge difference how to do condition - in rust code you generate one bit (boolean) which is probably done with integer operations. In code you generate integer then cast it to float, then perform float division and comparison with double value (which is probably optimize to float comparison as 0.5 can be represented in float). So apples to oranges... And a bit more - to generate x/y float random in rust you generate float point (not sure how it done), but in C you do integer value generation and float division. There could be significant differences in this too.

Edited by Mārtiņš Možeiko on March 7, 2019, 9:54am