A discussion of many aspects of parallelism and concurrency in programming, and the pros and cons of different programming methodologies.
This is a fishbowl: a panel conversation held on the Handmade Network Discord where a select few participants discuss a topic in depth. We host them on a regular basis, so if you want to catch the next one, join the Discord!
Now that the topic is "Parallel programming", what is that anyway? There is this distinction of Parallelism vs Concurrency, popularized maybe also by Rob Pike: https://www.youtube.com/watch?v=oV9rvDllKEg
04:17
But I can never remember which is which, and it seems like not everybody makes that precise distinction
I'm having trouble understanding that. The way I remember it is that one is about authoring code as independent "trains of thought", the other is actually running code on multiple physical cores. But not quite sure about that
WaywardNov 19, 2020 04:24 AM
responsiveness versus throughput? @jfs
04:26
do we have anyone in the fishbowl at all who's been playing around with parallel programming at all or are we just a big group of beginners?
If you look at "the three columns of concurrency" that I linked above, I'm mostly interested in the responsiveness pillar right now. But to make that not a pain to develop (e.g. by create a separate OS thread for each task), shared data locks (mutexes) are definitely needed
04:31
So, assuming that we're talking about both concurrency and parallelism here. Because one isn't very meaningful without the other, right?
WaywardNov 19, 2020 04:32 AM
i thought parallelism was more of the umbrella term given how computer hardware have developed during the last 25 years
A realization that lead me into being interested in this topic recently has been how even a single machine is a concurrent and multithreaded thing. Not only do machines have multiple CPU cores, they also have a separate graphics processing unit, and all sort of other things that (for example) communicate through interrupts. It gets more pronounced of course when adding networking...
04:34
I can't really do a plain fgets() or fread() anymore, it seems so wrong to me when I'm doing e.g. GUI on the same thread(edited)
04:35
The human interacting with the GUI is like a completely separate thread as well!
WaywardNov 19, 2020 04:37 AM
i'm more curious about the massively parallel part of the discussion, maximizing throughput and thereby optimizing for speed (running a single big task on multiple cpu threads to optimize for time)
so @bumbread is right; concurrency is basically the go/node idea where you can have lots of "things" (threads, green threads, coroutines, goroutines, ...) in-flight at the same time, but you can still do it all on a single core if that's all you have. and parallelism is more when things run truly in parallel (on different cores).
When does this distinction matter in practice? Except for performance, is it super important on how many cores (1 or many) I'm running my N concurrent threads?
In programming:
Concurrency is the composition of independently executing processes.
Parallelism is the simultaneous execution of (possibly related) computations.
@jfs I think you meant @NWDD but I can answer too first of all, I think thread safety can be many different things. You meantioned preemption, that's one part of it -- if you only have 1 CPU and you fully control when the next thread executes, then you can be a lot more relaxed about a lot of things, since you know no other CPU will be able to access your variables concurrently. in other words, you don't need locking at all.
I am using the term process in the general sense here to encompass the general idea.
e.g. types of processes: core, thread, os process, fibre, green-thread, etc
Now I have the definitions done and over with, let's gone down the dirty stuff
NWDDNov 19, 2020 05:01 AM
so for example: imagine you're developing a network layer which is something that is highly asynchronous by definition.
there are a lot of little details you have to handle: has the connection timed out? what data have you received? was there a network error? did the user put a nintendo switch in the microwave so the network connection broke?
with concurrency you can actually ditch all means of synchronization and have it still happen form a perceived point in "parallel"(edited)
Unless I'm misunderstanding what you mean, I disagree. You often need synchronization with concurrency. Many webapps have synchronization bugs even though they are entirely single-threaded.
NWDDNov 19, 2020 05:07 AM
ah, yeah, i meant specifically cooperative concurrency (for example fibers or lua coroutines running in a single-thread or cooperative threads pinned to one core)
WaywardNov 19, 2020 05:10 AM
@AsafGartner that sounds like a problem with the single threaded event driven process rather than a simple form of monte carlo where you have one thread that creates jobs and takes back the result afterwards with no interaction in-between each job thread
Synchronization is for shared state. If you don't have shared state then it's no problem, but if you do you still need to enforce synchronization even in non-parallel systems.
VegardNov 19, 2020 05:11 AM
@AsafGartner web apps are sometimes spawned in parallel by the web server, so even though the program itself is single-threaded, there can still be multiple instances of it running in parallel. not sure if this is what you're talking about
I would like to state that synchronization is needed whenever two processes (in the sense of @gingerBill 's definition) communicate. Where communication means any kind of information sharing
A more concrete example, which I've encountered in several apps, is you hit play in a music app, it starts buffering the song, you hit pause, and when it's done buffering it plays anyway.
NWDDNov 19, 2020 05:13 AM
that can happen even in a 100% single-thread application where you flip some global bools and don't check them properly though
05:14
so for example, today an user reported that on Spelunky 2 if you freeze an enemy that grabs the player in a certain point in time
05:14
and then approach the enemy, it grabs you while frozen
i'm not sure but i also believe it's non-preemptive?
WaywardNov 19, 2020 05:23 AM
classic javascript i believe is non-preemptive yes, however google chrome attempts to make it preemptive to increase speed, which was later forked into node.js
well it added worker threads AFAIK, so that changed maybe. But fundamentally it's how I understand a non-preemptive runtime
05:25
a generally non-preemptive framework would maybe also allow a process to choose what other threads can be scheduled while this process is waiting for an I/O request to complete(edited)
NWDDNov 19, 2020 05:25 AM
I've seen infinite loops hang the whole execution (of a web-server)(edited)
NWDDNov 19, 2020 05:26 AM
but either way, I don't think non-parallel preemptive systems interesting or useful xD
yes, this is an important point @AsafGartner . There are certain frameworks (such as actors, channels) that make parallelism much easier(edited)
05:26
they encapsulate the needed synchronization, and so dramatically reduce manual work
VegardNov 19, 2020 05:29 AM
@NWDD are you saying you don't think coroutines or green threads are ever interesting, or that they're only interesting if you can (also) schedule them in parallel?
NWDDNov 19, 2020 05:30 AM
that "preemptive non-parallel concurrency" is not interesting in general
05:30
coroutines are great (obviously mandatory everything I say is my opinion and not a fact blablablablabla)(edited)
In practice I would agree. In theory I think there's almost always a limit to the unfairness you're willing to have in a system.
VegardNov 19, 2020 05:37 AM
I'm not sure if the fairness is really a big issue in practice. you can kinda see when you're going to have a lot of work to do and yield manually from time to time
it's definitely an issue when you're doing some number crunching
05:37
such as decoding a JPG
05:38
software is in some respects easier to architect when you just assume preemption
05:38
and combine pieces that you know will be run in parallel
WaywardNov 19, 2020 05:39 AM
the worst thing is when it's a surprise
NWDDNov 19, 2020 05:39 AM
if you assume preemption and combine pieces that you know that will be run in parallel, then it's hard to replace it with concurrency, yes, that sounds about right
the idea of coroutines might be that we want to write linear code (top-to-bottom) while still interacting with other threads in a totally synchronized fashion
05:42
what you describe sounds more like asynchrony to me @Wayward
WaywardNov 19, 2020 05:43 AM
for reading from disk, i think that's a great thing
NWDDNov 19, 2020 05:43 AM
it's kind of easier describing a transition like "change opacity during X time, then change scene, then change opacity during X time" than pretty much any other alternative
1
05:44
they are really hard to follow and debug and everything if you use them a lot in a dogma-like way, yes
VegardNov 19, 2020 05:46 AM
@jfs coroutines are very often used to implement that asynchronicity
VegardNov 19, 2020 05:55 AM
there are so many concepts and nuances to this whole parallel/concurrent thing would be cool to have an infographic or something relating all the terms to each other
Yes. I didn't really understand what you meant @NWDD . And what is it anyway, that is "acting like a separate process" (like @gingerBill sasys)?
05:56
In my mind, coroutines are something that the OS is unaware of
05:56
basically I'm thinking of fibers
1
05:57
Fibers btw. are seen as a failed experiment(edited)
05:57
POSIX removed their implementation, and Microsoft deprecated them long ago
05:58
because with them, we not only have process-global and thread-global storage, but also fiber-global storage(edited)
05:58
and all the related confusion that comes with it
05:58
also when scheduling a single fiber on multiple OS threads
05:58
and so on
VegardNov 19, 2020 05:58 AM
glibc still has setcontext/swapcontext/etc. I think the reason it was deprecated in POSIX was because it had a bunch of requirements WRT signals, which meant that you need 2+ syscalls per fiber switch, which is slow.
05:59
so it's still fully possible to define your own semantics and get good performance out of them. just not with an implementation following the POSIX specification
@NWDD People rarely know what they want before trying stuff out.
VegardNov 19, 2020 06:10 AM
aren't fibers and green threads synonyms?
NWDDNov 19, 2020 06:11 AM
fibers are always cooperative and something that is controlled by the programmer (so the programmer has to decide when does the fiber start, where does it execute, at which point it is stopped, if/how is it scheduled...)(edited)
VegardNov 19, 2020 06:14 AM
well, okay. so you can use fibers to implement green threads
that only falls flat in tight loops where all the allocation has been optimized out
06:18
I don't think there is another way of doing scheduling in userland, or is there? How would that UMS work, anyway?
VegardNov 19, 2020 06:20 AM
on Linux at least you can register a timer that will send a signal, then you can context switch from the signal handler. but this does go through the kernel as well, so it's not that great for performance
NWDDNov 19, 2020 06:20 AM
well, in the end everything is semantic, that depends on what you consider "doing scheduling"
The compiler can then place that "process" in an arbitrary memory location. It could be allocated as a "value type".(edited)
06:26
Well yeah, I'm not sure, I guess it's still implemented using rsp
VegardNov 19, 2020 06:47 AM
since it's quiet -- I'm just curious if anybody here is doing game dev and if so, what kind of parallel code do you see in practice in whatever games/engines you're working on?
06:48
I assume most people have a separate audio thread at the very least. what else? do you scale number of threads with number of CPUs?
I'm doing a GUI for industrial automation. I also have a little jigsaw puzzle game that I intend to network (match making, cooperative gaming)
06:50
I have background processes as well, and I want to be able to cancel them quickly
06:50
The background processes do file and networking I/O
06:51
so I've been experimenting with layering them on top of async IO
06:51
and trying to find a way to write them in a linear top-to-bottom fashion - basically in a conventional blocking style, with the exception that they should react to custom cancellation signals(edited)
06:52
and trying to make that code not be a mess
NWDDNov 19, 2020 06:54 AM
since it's quiet -- I'm just curious if anybody here is doing game dev and if so, what kind of parallel code do you see in practice in whatever games/engines you're working on?
@Vegard In indie games it's quite rare for something other than loading / audio and sometimes input
And when done it's usually not worth it
WaywardNov 19, 2020 06:55 AM
@NWDD what do you mean with "when done"? remaking the whole game after release to implement threading?
NWDDNov 19, 2020 06:56 AM
Meaning that on games than I've worked, removing multi-threading has often provided better performance
06:57
I'm not sure how much into detail I can go? but for example in a game we worked on that used a parallelized scripting language with a reference counting gc
06:58
removing all forms of threading and replacing it with concurrency, provided better performance
VegardNov 19, 2020 06:58 AM
@Wayward I think it was meant as "when people do write their games to make use of multithreading it doesn't give you any wins in terms of performance"
I have noticed a lot of beginners to parallelizing things (and I am by no means an expert either), do it because they think it will make things faster, when in reality their problem could be made extremely fast on a single thread and multiple threads would slow things down.
WaywardNov 19, 2020 07:10 AM
your point being multithreading primarily having benefits assuming the code is already optimized?
If you want to use multiple processes by default, you will require some sort of system that can schedule things efficiently for the numerous amount of works e.g. M:N threading (e.g. Go)
@gingerBill but even if you have efficient scheduling you still need to layout things in a way that it doesn't thrash, then use algorithms that are thread safe and slower or pay for the extra cost of locking (and either annoy the scheduler to get kicked earlier or pay the context switch)... And in the end you end up with something that makes a worse use for a system if that system is shared by other programs
yes, I mean you could have theoretically a game fit in a single cache-line(edited)
07:24
but not be parallel-friendly
VegardNov 19, 2020 07:25 AM
a 64-byte game?
NWDDNov 19, 2020 07:25 AM
it's a theoretical exercise xD
07:25
not daring anyone to do so
VegardNov 19, 2020 07:26 AM
it's not a bad dare.
NWDDNov 19, 2020 07:26 AM
just saying that not being cache friendly (when doing multithreaded stuff) isn't related to using a huge amount of memory or optimizing(edited)
1
WaywardNov 19, 2020 07:27 AM
@NWDD can we just jump straight to the long version?
NWDDNov 19, 2020 07:27 AM
what long version?
07:27
oh that's right, the game
07:29
regarding that game I was talking about,
we're now changing it so that instead of running separate systems on separate threads
everything will end up running in the same thread in order
this allows us to free cores so that we can actually run a separate game simulation on each core
which solves a harder problem regarding online multiplayer
point being: using multithreading for things that do not require it might actually prevent the program from reaching its full potential
WaywardNov 19, 2020 07:33 AM
single threaded applications are recommended until no longer viable, what about the latter case?
NWDDNov 19, 2020 07:34 AM
I can imagine a lot of cases where you actually want multi-threading eh? I mean for example: a compiler, a baking process, a simulation that can go faster than realtime...
It just feels a bit sad to have a thread switched out because someone decided that browsers need to have an insane level of parallelism and the gpu driver decided to spawn 30 threads in your game.(edited)
One thing some people in the audience are wanting to be discussed is the relation between parallelism and concurrency, and how that relates to different "models".
07:47
The relation is with regards to how the separate processes/threads communicate together.
07:48
Be it either communication through shared memory or communication through message passing.
07:49
Shared memory communication (usually) requires a form of locking (semaphores, mutexes, monitors, etc) to coordinate the processes/threads.
07:49
Whilst message passing based communication does so by exchanging "messages" between the different processes/threads.
07:50
What is interesting is how these "messages" interact, whether its asynchronously or synchronously.
A good example of this would be having "a queue + a monitor", the monitor is used to synchronize the queue. In certain models, this has a special name depending on it handled.
In a way, these two families can be seen as duals of one another
07:56
The actor model works very well in object orientated languages because of the "object" nature of the "actors". They are opaque to each other actor.
07:57
When these "actors" send messages, it is completely asynchronous, which means that no blocking on the receiver end.
07:58
So for the "actors", the "mailbox" (the thing holding the "messages", i.e. queue) might need to be potentially loads of messages, so it's a lot harder to reason about.
07:58
CSP (a form of Process Calculus, which Go uses a slightly variant of), is fully synchronous in nature.
07:58
The message is flipped from being the unit of communication to the channel.
07:59
A channel is effectively a message queue with a monitor applied to it.
07:59
The channel is the thing that referenced to, where messages can be sent to it or received from.
08:00
Most channels will block, and thus synchronous, until a channel receiver reads.
08:00
This has advantages of that the blocking mechanism is that the channel only needs to "hold" one message.
08:01
Go extends this to allow for buffered channels (of fixed capacity), so that you can have stuff similar to the Actor more, but the channel is the focus, rather than the sender/receiver (i.e. the actor).
08:04
So the in Process-Calculus family of models, the processes communicate (and synchronize) through channels, whilst the Actors are themselves the processes.
08:05
One interesting thing is that the actor model does have the advantage of being "modelable" across multiple machines e.g. each machine an actor which then communicates over a network.
08:05
So because the models are effectively duals of each other, they can be applied at different levels for different problems.
08:06
But because these are generalized models, they will not be good for all problems (obviously).
08:06
But one thing which seems to be a given across all forms of concurrency is the concept of a queue.
08:06
QUEUES EVERYWHERE!(edited)
NWDDNov 19, 2020 08:07 AM
well, concurrency through shared memory may disagree
My point above was probably that there is a tradeoff - we can replace explicit synchronization on shared memory by mediating accesses through a data-structure-managing actor (process)
08:35
at the cost of more strict serialization
NWDDNov 19, 2020 08:35 AM
this is what I understood:
1 requester to 1 source == trivial -> (straight-forward single-thread code)
M requesters to 1 source == shared memory -> (for example: multiple threads accessing a hash-table)
1 requester to N sources == message -> (for example: a thread that receives network packets from multiple sockets and puts information in a hashtable)
M requesters to N sources == hybrid -> (for example: multiple threads that receive network packets from multiple sockets and put information in a hashtable)
I ask because I know that there's tons of bug potential once you start doing things concurrently. I had a good time a while ago going through this thing: https://deadlockempire.github.io/
For the audience (me), could you define the difference?
NWDDNov 19, 2020 10:01 AM
What's the worst concurrency bug you've all encountered?
A race condition that happened in a non-parallel situation of a so-called "safe" programming language (C#) in a game that was more or less ECS-based, where a system removed the "rigidbody" component in a very specific (user-triggered) situation.
There was a separate system which worked like a finite-state-machine that was being updated every few frames.
If during a transition between two states the rigidbody got removed, the thread that was running the logic crashed and the C# runtime for the target platform would silently swallow the exception freezing the game (without providing any crash-dump or useful information)(edited)
1
NWDDNov 19, 2020 10:12 AM
Here's a question, is there actually ever a need for a semaphore instead of a condition variable?
Have been thinking for a while, but I don't think I've ever felt the need or seen an advantage to using a semaphore
10:13
I do recall at some point using them because it was provided in a library example?
because the most recommended scenario for a semaphore is a pool of resources
10:26
however when you have a pool of resources then also want a handle to the specific resource you got
10:27
which need an additional thread-safe way to handle it
10:27
and that thing to dole out the resource handles can handle blocking the thread just as well as the semaphore
NWDDNov 19, 2020 10:29 AM
I can see them being useful for example to request a system to run once (for example run a physics update)?
But because more often than not you need to know when it has finished and wait for it, there are other options?
Plus it feels like semaphores aren't really well known which means that it may confuse other people working in the project
In the end I feel that because we're used to do so it's easier to reason about critical regions of code that should not execute at the same time and other options are just not as popular?(edited)
they are still taught as a synchronization primitive though
10:31
and for the record:
semaphore is a counter where if you try an decrement below 0 it blocks until another thread increments it above 0
a condition variable is an object that lets you wait on a signal from the thread while temporarily releasing a mutex
NWDDNov 19, 2020 10:32 AM
yes, people are taught about synchronization in general but I don't think that knowledge persists in time after tests
For those who have worked with parallel programming a fair amount, doing parallel programming properly such that close-to-optimal code is produced given any parallel problem seems extremely difficult to master, and extremely difficult to do in general. How much of this do you attribute to tools? Are there tools you wish you had when learning---or doing---this kind of programming?
What's the worst concurrency bug you've all encountered?
@bvisness I've run into a weird situation that involved some sort of false sharing. I had a writer queue that would normally transfer around 20MB / sec, but in that situation the throughput dropped to about 400KB / sec
11:34
forgot to mention explicitly that the bottleneck (400KB/s) here was a memcpy() ;-). When the original 20MB/s was simply the hardware source
11:01
@ryanfleury I'm not sure I'm attributing anything on tools. What we've missed out on in the discussion so far is focussing more on what actually happens in the hardware, HM style.(edited)
It for sure is important to understand the basics, like Caches, MESI, and synchronization primitives
11:14
the Linux kernel goes quite fancy with synchronization on shared data structures, so maybe that is a recommend. Though on the other hand, this approach is error-prone, not beautiful from an architectural perspective.
11:16
I'm sure that static analyzers, fuzzers, and similar tools can help a lot with debugging shared data structures. So that would be an answer to Ryan's question. I can't say anything about the quality of the existing tools, not really having used any
VegardNov 19, 2020 11:17 AM
I feel like I ought to have some tools to recommend, but the truth is that I don't really. usually the challenge when doing something fancy (i.e. more than the straightforward spinlock/mutex) is getting things to work correctly at all, and then the best thing you can do is to stress-test on as many CPUs as you can. so in a way maybe a machine with lots of CPUs is the best way to find bugs..?
With my relatively humble stuff, I found it surprising how quickly some of my concurrency bugs were found by simple stress testing @Vegard
11:20
though that is somewhat of a tautology, it doesn't take a lot of testing until you hit no bugs anymore (i.e. just fewer ;-])
VegardNov 19, 2020 11:23 AM
it can be really tough to hit some bugs, so I don't advocate testing as a way to validate your design
11:25
for sanity checking or bug finding it's okay
WaywardNov 19, 2020 01:51 PM
@ryanfleury on a sidenote, both the current topic and the next few could really use some resource recommendations listed on the handmade network resource page afterwards
I'm not sure I see the distinction you're going for. Are you saying the important part of CSP is the communication, and the important part of the actor model is the actors (e.g. processes)?
I'm not really getting the distinction either. Also not all of CSP is channel focused or communication focused. If I recall the first half of the book doesn't even mention channels. I recall CSP more as an algebra over sequences of events
14:19
but for the channel part, not sure what is that relevant or insightful. Would very much like to know
14:20
I gather that CSP channels are understood as sending messages synchronously (i.e. the sender waits until recipient "dequeues" it), as opposed to asynchronous sending with Actor model, but I'm also not sure that this is super relevant, nor would I actually want to waste the time waiting
There are in any case, different variants of all the theories - for example I believe CSP started out as a static graph of processes connected by channels, but it might be that this constraint was lifted in later publications (don't quote me, I haven't really checked back)
14:24
I know that actor model is definitely always understood as sending completely asynchronously @bvisness . At least in these days
14:25
but if I'm thinking of modelling my program after the actor model, I'll probably consider allocating messages statically in the sender, so allocation isn't a performance bottleneck, and the number of existing messages is naturally bounded
14:26
that would also mean each sender has to wait until it gets some of its messages returned back before it can send more, which would again lead to some form of "blocking". Not really while sending but still
that makes sense, and would lead to a model like I'm used to from Go
14:31
(what you describe is basically the behavior of a buffered channel)
14:32
I actually very rarely see any of Go's concurrency features used heavily outside the standard library, and I'm not sure why. They certainly feel well-integrated into the language, but nobody seems to use them.
14:33
Maybe it's just laziness or people not knowing how to use them.
I certainly do agree that the actor model / CSP is a good conceptual fit for an object-oriented language (I continue to think they are essentially the same)
14:36
I guess I'm curious, though, if it's an essentially object-oriented concept. Concurrent execution seems to me to imply that you have concurrent "processes", which will certainly have their own state. Whether that's an "object" vs. a coroutine, goroutine, thread, etc. seems unrelated.
14:37
After all, you can apply actor model or CSP designs inside a program, or across network boundaries, and the concepts feel the same to me
So does the actor model not say anything about synchronization? I wasn't under the impression that e.g. a smalltalk system would make the communication between two objects asynchronous.
Pasting some messages from #fishbowl-audience about the differences between CSP and the actor model...
14:54
Some definitions from the internet:
An actor is a computational entity that, in response to a message it receives, can concurrently:
send a finite number of messages to other actors;
create a finite number of new actors;
designate the behavior to be used for the next message it receives.
Recipients of messages are identified by address, sometimes called "mailing address". Thus an actor can only communicate with actors whose addresses it has. It can obtain those from a message it receives, or if the address is for an actor it has itself created.
The actor model is characterized by inherent concurrency of computation within and among actors, dynamic creation of actors, inclusion of actor addresses in messages, and interaction only through direct asynchronous message passing with no restriction on message arrival order.
But what is the essential difference between sending a message "via a channel" or "to an address"?
My overall take
I guess it's true that in a CSP system you don't directly reference the other actors. But I'm still not convinced that meaningfully changes the model
you could conceive of channels as "actors" that hand messages back and forth, and now you've achieved basically the same thing?
it doesn't feel to me like a fundamental difference
So @gingerBill are you saying the difference is that in CSP you control the edges, while in the actor model you control the vertices? Or something like that?
I guess I still don't think they are all that different in practice, since channels feel in that context like extra restrictions on actor communication.
and channels really being a certain standard implementation with a fixed comm interface(edited)
15:05
I also happen to think that it is extremely important to use standard interfaces for concurrently modelled programs
15:06
from my experience setting up all these graphs and actors and channels and what not, and then using the comm ifaces right, is extremely painful, so one wants to maintain and use as few primitives as possible(edited)
honestly it just sounds like nit picking to distinguish between them, you can turn the actor model into CSP by making a bunch of channel actors who forward messages to everyone who subscribed
Scala's Akka is the most famous actor implementation that I'm aware of. Erlang is a popular and successful language that implements the actor model. Never really used any of these(edited)