Approaches to parallel programming
A discussion of many aspects of parallelism and concurrency in programming, and the pros and cons of different programming methodologies.
This is a
fishbowl: a panel conversation held on the Handmade Network Discord where a select few participants discuss a topic in depth. We host them on a regular basis, so if you want to catch the next one,
join the Discord!
Topic: Approaches to parallel programming
(edited)
But I can never remember which is which, and it seems like not everybody makes that precise distinction
according to Pike, concurrency is handling lots of things at once, whereas parallelism is doing alot of things at once
I'm having trouble understanding that. The way I remember it is that one is about authoring code as independent "trains of thought", the other is actually running code on multiple physical cores. But not quite sure about that
responsiveness versus throughput? @jfs
do we have anyone in the fishbowl at all who's been playing around with parallel programming at all or are we just a big group of beginners?
concurrency == breaking program into tasks that are independent from each other (threads baically)
parallelism == physically running few tasks on different cpu cores
2
If you look at "the three columns of concurrency" that I linked above, I'm mostly interested in the responsiveness pillar right now. But to make that not a pain to develop (e.g. by create a separate OS thread for each task), shared data locks (mutexes) are definitely needed
So, assuming that we're talking about both concurrency and parallelism here. Because one isn't very meaningful without the other, right?
i thought parallelism was more of the umbrella term given how computer hardware have developed during the last 25 years
let's go with that then
A realization that lead me into being interested in this topic recently has been how even a single machine is a concurrent and multithreaded thing. Not only do machines have multiple CPU cores, they also have a separate graphics processing unit, and all sort of other things that (for example) communicate through interrupts. It gets more pronounced of course when adding networking...
I can't really do a plain fgets() or fread() anymore, it seems so wrong to me when I'm doing e.g. GUI on the same thread
(edited)
The human interacting with the GUI is like a completely separate thread as well!
i'm more curious about the massively parallel part of the discussion, maximizing throughput and thereby optimizing for speed (running a single big task on multiple cpu threads to optimize for time)
the "embarassingly parallel" thing
yeah, monte carlo is a good example of that
@Vegard @NWDD the fishbowl has started
hello everyone (and thanks for the notification)
(edited)
do we have anyone who's played around with threads and parallelism here?
2
pinging you as well @gingerBill @AsafGartner
Hello
1
Distinguishing between Parallelism and Concurrency is extremely important to do. They are orthogonal concepts.
1
Rob Pike's distinction is a good one.
so @bumbread is right; concurrency is basically the go/node idea where you can have lots of "things" (threads, green threads, coroutines, goroutines, ...) in-flight at the same time, but you can still do it all on a single core if that's all you have. and parallelism is more when things run truly in parallel (on different cores).
well maybe let's not worry about definitions for now
Concurrency is about dealing with lots of things at once.
Parallelism is about doing loads of things at once.
1
When does this distinction matter in practice? Except for performance, is it super important on how many cores (1 or many) I'm running my N concurrent threads?
it's hard to say what dealing means
(edited)
or is it really happening at once?
concurrency doesn't need thread-safety (to start with a difference that matters in practice)
(edited)
In programming:
Concurrency is the composition of independently executing processes.
Parallelism is the simultaneous execution of (possibly related) computations.
Purely parallel problems could be classed as idempotent. @jfs's link to "Embarrassingly Parallel" is the essence of this.
why does concurrency not need thread-safety @Vegard? What is that thread-safety here, precisely?
Are you thinking about needing less synchronization, when you have truly independent tasks? Or things like (non-) preemption?
it's generally independent tasks
Idempotent in this case means independent.
Processes do not effect each other.
@jfs I think you meant @NWDD but I can answer too
first of all, I think thread safety can be many different things. You meantioned preemption, that's one part of it -- if you only have 1 CPU and you fully control when the next thread executes, then you can be a lot more relaxed about a lot of things, since you know no other CPU will be able to access your variables concurrently. in other words, you don't need locking at all.
Now I have the definitions done and over with, let's gone down the dirty stuff
so for example: imagine you're developing a network layer which is something that is highly asynchronous by definition.
there are a lot of little details you have to handle: has the connection timed out? what data have you received? was there a network error? did the user put a nintendo switch in the microwave so the network connection broke?
with concurrency you can actually ditch all means of synchronization and have it still happen form a perceived point in "parallel"
(edited)
Unless I'm misunderstanding what you mean, I disagree. You often need synchronization with concurrency. Many webapps have synchronization bugs even though they are entirely single-threaded.
ah, yeah, i meant specifically cooperative concurrency (for example fibers or lua coroutines running in a single-thread or cooperative threads pinned to one core)
@AsafGartner that sounds like a problem with the single threaded event driven process rather than a simple form of monte carlo where you have one thread that creates jobs and takes back the result afterwards with no interaction in-between each job thread
@AsafGartner can you give an example of that?
Synchronization is for shared state. If you don't have shared state then it's no problem, but if you do you still need to enforce synchronization even in non-parallel systems.
@AsafGartner web apps are sometimes spawned in parallel by the web server, so even though the program itself is single-threaded, there can still be multiple instances of it running in parallel. not sure if this is what you're talking about
The common bug is check state -> make HTTP request -> perform action without checking if the state changed and assuming it didn't.
I would like to state that synchronization is needed whenever two processes (in the sense of @gingerBill 's definition) communicate. Where communication means any kind of information sharing
1
A more concrete example, which I've encountered in several apps, is you hit play in a music app, it starts buffering the song, you hit pause, and when it's done buffering it plays anyway.
that can happen even in a 100% single-thread application where you flip some global bools and don't check them properly though
so for example, today an user reported that on Spelunky 2 if you freeze an enemy that grabs the player in a certain point in time
and then approach the enemy, it grabs you while frozen
Exactly. That's why synchronization is not just for parallel operations.
the general idea is that "you don't need as much work to avoid things going bad with concurrency compared to full parallelism"
If we're assuming preemption, I'm still not sure what's a situation where it matters if a concurrent application is being run with parallelism or not.
Non-preemption (where each process controls when it gives up control of the CPU) could be seen as just like preemption with a global mutex
except it's efficient, yes
isn't that more of a specific node.js issue as it's a preemptive event driven framework?
never went down the javascript rabbithole personally
Javascript is non-preemptive
i'm not sure but i also believe it's non-preemptive?
classic javascript i believe is non-preemptive yes, however google chrome attempts to make it preemptive to increase speed, which was later forked into node.js
well it added worker threads AFAIK, so that changed maybe. But fundamentally it's how I understand a non-preemptive runtime
a generally non-preemptive framework would maybe also allow a process to choose what other threads can be scheduled while this process is waiting for an I/O request to complete
(edited)
I've seen infinite loops hang the whole execution (of a web-server)
(edited)
but either way, I don't think non-parallel preemptive systems interesting or useful xD
Worker threads run in a separate context and you have to communicate with them over a message channel, so you don't have parallelism issues there.
yes, this is an important point @AsafGartner . There are certain frameworks (such as actors, channels) that make parallelism much easier
(edited)
they encapsulate the needed synchronization, and so dramatically reduce manual work
@NWDD are you saying you don't think coroutines or green threads are ever interesting, or that they're only interesting if you can (also) schedule them in parallel?
that "preemptive non-parallel concurrency" is not interesting in general
coroutines are great (obviously mandatory everything I say is my opinion and not a fact blablablablabla)
(edited)
1
But coroutines aren't always fair.
Like when your older brother hogs the controller and you have to get your mom to do some preemption.
2
but fairness is not always a desirable feature, right?
In practice I would agree. In theory I think there's almost always a limit to the unfairness you're willing to have in a system.
I'm not sure if the fairness is really a big issue in practice. you can kinda see when you're going to have a lot of work to do and yield manually from time to time
it's definitely an issue when you're doing some number crunching
software is in some respects easier to architect when you just assume preemption
and combine pieces that you know will be run in parallel
the worst thing is when it's a surprise
if you assume preemption and combine pieces that you know that will be run in parallel, then it's hard to replace it with concurrency, yes, that sounds about right
I dislike coroutines in the traditional sense of them. Because they are hard to reason about. They don't act like a separate "process".
(edited)
i like the idea about them, you can go say i need that later but it's big so please start reading and caching it from disk now so it'll be ready later
although i haven't toyed around with them so i'll have to see in practice when i get around to that
the idea of coroutines might be that we want to write linear code (top-to-bottom) while still interacting with other threads in a totally synchronized fashion
what you describe sounds more like asynchrony to me @Wayward
for reading from disk, i think that's a great thing
it's kind of easier describing a transition like "change opacity during X time, then change scene, then change opacity during X time" than pretty much any other alternative
1
they are really hard to follow and debug and everything if you use them a lot in a dogma-like way, yes
@jfs coroutines are very often used to implement that asynchronicity
there are so many concepts and nuances to this whole parallel/concurrent thing
would be cool to have an infographic or something relating all the terms to each other
Yes. I didn't really understand what you meant @NWDD . And what is it anyway, that is "acting like a separate process" (like @gingerBill sasys)?
In my mind, coroutines are something that the OS is unaware of
basically I'm thinking of fibers
1
Fibers btw. are seen as a failed experiment
(edited)
POSIX removed their implementation, and Microsoft deprecated them long ago
because with them, we not only have process-global and thread-global storage, but also fiber-global storage
(edited)
and all the related confusion that comes with it
also when scheduling a single fiber on multiple OS threads
glibc still has setcontext/swapcontext/etc. I think the reason it was deprecated in POSIX was because it had a bunch of requirements WRT signals, which meant that you need 2+ syscalls per fiber switch, which is slow.
so it's still fully possible to define your own semantics and get good performance out of them. just not with an implementation following the POSIX specification
fibers can also block other fibers that are run on the same thread
that's the point of having fibers (blocking other fibers that are run on the same thread)
(edited)
right, that was actually my initial point
Though, another intended use case for them was to be more resource friendly and to involve the kernel less
I'm curious to know if UMS is actually useful.
Apparently it was made specifically for the MSSQL team.
Fibres were a failed experiment because they were not what people actually wanted.
Effectively, people usually (not always) wanted a form of green thread be it N:1 or M:N.
That I can agree, but now again, people usually (not always) wanted OOP and we know how that went :)
@NWDD People rarely know what they want before trying stuff out.
aren't fibers and green threads synonyms?
fibers are always cooperative and something that is controlled by the programmer (so the programmer has to decide when does the fiber start, where does it execute, at which point it is stopped, if/how is it scheduled...)
(edited)
well, okay. so you can use fibers to implement green threads
green threads being "user scheduled threads"?
I looked up how Haskell does its runtime - they have sort of M:N green threads as well
basically they can do it because they have a lot of allocation going on
(edited)
so each allocation is potentially a fiber switching point
(edited)
that only falls flat in tight loops where all the allocation has been optimized out
I don't think there is another way of doing scheduling in userland, or is there? How would that UMS work, anyway?
on Linux at least you can register a timer that will send a signal, then you can context switch from the signal handler. but this does go through the kernel as well, so it's not that great for performance
well, in the end everything is semantic, that depends on what you consider "doing scheduling"
I mean (pseudo-) preemptively scheduling a fiber in place of another fiber
to get what OS threads are implemented at a lower cost (with fibers)
yeah, but like, you can implement coroutines or continuations without fibers
and if it's just swapping rsp and a couple of things, can it be considered scheduling?
wouldn't coroutines always involve switching out the call stack?
there are also stackless coroutines, right?
Switching out the call stack is basically how I understand "Fibers". Though, the term is also associated with the Win32 Fiber API
stackless coroutines is something else
and that's why it depends on semantics :)
also you could like, code in continuation passing style without a stack itself
Stackless coroutines can be done when the compiler can compute the max stack size that a particular function call (hierarchy) needs
The compiler can then place that "process" in an arbitrary memory location. It could be allocated as a "value type".
(edited)
Well yeah, I'm not sure, I guess it's still implemented using rsp
since it's quiet -- I'm just curious if anybody here is doing game dev and if so, what kind of parallel code do you see in practice in whatever games/engines you're working on?
I assume most people have a separate audio thread at the very least. what else? do you scale number of threads with number of CPUs?
what kind of synchronisation do you use?
I'm doing a GUI for industrial automation. I also have a little jigsaw puzzle game that I intend to network (match making, cooperative gaming)
I have background processes as well, and I want to be able to cancel them quickly
The background processes do file and networking I/O
so I've been experimenting with layering them on top of async IO
and trying to find a way to write them in a linear top-to-bottom fashion - basically in a conventional blocking style, with the exception that they should react to custom cancellation signals
(edited)
and trying to make that code not be a mess
since it's quiet -- I'm just curious if anybody here is doing game dev and if so, what kind of parallel code do you see in practice in whatever games/engines you're working on?
@Vegard In indie games it's quite rare for something other than loading / audio and sometimes input
And when done it's usually not worth it
@NWDD what do you mean with "when done"? remaking the whole game after release to implement threading?
Meaning that on games than I've worked, removing multi-threading has often provided better performance
I'm not sure how much into detail I can go? but for example in a game we worked on that used a parallelized scripting language with a reference counting gc
removing all forms of threading and replacing it with concurrency, provided better performance
@Wayward I think it was meant as "when people do write their games to make use of multithreading it doesn't give you any wins in terms of performance"
which I'm sure is not true for AAA where they do have e.g. separate render threads
(edited)
but AAAs are usually not CPU-bound by stuff that is often associated with CPU (AI, pathfinding, networking, world generation...)
to me it sounds like the bottleneck is the kernel switching/synchronization when working with threads or am i wrong?
that's being bound by the cpu cache?
removing all forms of threading and replacing it with concurrency, provided better performance
That doesn't mean anything.
If your problem was parallelizable and also concurrent, you can do both.
(edited)
But how much of both is a complicated question
indie-games are more often than not cache-bound/access
yes... it's complicated, most parallelizable tasks end up taking less than what the context switch may take.
(edited)
Big things that are parallelizable, like the zombies in they are billions or collisions in physics engines it ok
(edited)
I have noticed a lot of beginners to parallelizing things (and I am by no means an expert either), do it because they think it will make things faster, when in reality their problem could be made extremely fast on a single thread and multiple threads would slow things down.
your point being multithreading primarily having benefits assuming the code is already optimized?
For "beginners", at least.
Sometimes naïve multithreading can help a lot too.
e.g. multithreading a parser
A basic parser for a language can have a parsing queue for new files to parse and then workers to parse each file.
The only synchronization point is when one worker add a new file to parse to the queue.
multithreading always has a higher cost than running in single-thread
so it's better to use it when needed rather than by default
As with other non-multithreaded problems, it's a lot about reducing the overhead by doing more work per cost
i.e. batch the shit out of it
If you want to use multiple processes by default, you will require some sort of system that can schedule things efficiently for the numerous amount of works e.g. M:N threading (e.g. Go)
what's the pros and cons of multiple processes in the task manager (google chrome) vs a single process with multiple threads?
To be clear, I am using the term "process" in a very general sense.
The benefit is process isolation
@gingerBill but even if you have efficient scheduling you still need to layout things in a way that it doesn't thrash, then use algorithms that are thread safe and slower or pay for the extra cost of locking (and either annoy the scheduler to get kicked earlier or pay the context switch)... And in the end you end up with something that makes a worse use for a system if that system is shared by other programs
Exactly!
1
A generic algorithm cannot know how to optimize your specific problem.
so basically, for it to be useful, you either need a huge cache or really optimized memory usage?
no, well, or really separate tasks
for example, in a game we recently released we were running liquids simulation and particles in threads separate from the main loop
Optimized memory usage is a different problem.
That still exists in single threading problems.
And not really a parallelism specific problem
yes, I mean you could have theoretically a game fit in a single cache-line
(edited)
but not be parallel-friendly
a 64-byte game?
it's a theoretical exercise xD
not daring anyone to do so
just saying that not being cache friendly (when doing multithreaded stuff) isn't related to using a huge amount of memory or optimizing
(edited)
1
@NWDD can we just jump straight to the long version?
oh that's right, the game
regarding that game I was talking about,
we're now changing it so that instead of running separate systems on separate threads
everything will end up running in the same thread in order
this allows us to free cores so that we can actually run a separate game simulation on each core
which solves a harder problem regarding online multiplayer
point being: using multithreading for things that do not require it might actually prevent the program from reaching its full potential
single threaded applications are recommended until no longer viable, what about the latter case?
I can imagine a lot of cases where you actually want multi-threading eh? I mean for example: a compiler, a baking process, a simulation that can go faster than realtime...
It just feels a bit sad to have a thread switched out because someone decided that browsers need to have an insane level of parallelism and the gpu driver decided to spawn 30 threads in your game.
(edited)
as long as those 30 threads are all pinned to the same core that's fine, no?
maybe, I think so... It didn't happen on any of our machines so we couldn't investigate it enough
One thing some people in the audience are wanting to be discussed is the relation between parallelism and concurrency, and how that relates to different "models".
The relation is with regards to how the separate processes/threads communicate together.
Be it either communication through shared memory or communication through message passing.
Shared memory communication (usually) requires a form of locking (semaphores, mutexes, monitors, etc) to coordinate the processes/threads.
Whilst message passing based communication does so by exchanging "messages" between the different processes/threads.
What is interesting is how these "messages" interact, whether its asynchronously or synchronously.
let's say that message passing does not require explicit locking
I'd argue that the serialization (copy) required to exchange "messages" is a form of locking where you prepay instead of pay-by-use
it might still be implemented using locks under the hood
especially synchronous message passing
@jfs I was about to say that the message-based models may still require "locks" of some sort, but they are not explicitly handled by the user.
1
So the model difference is if the locking is implicit or explicit.
(edited)
1
there are similar mechanisms for shared data
I believe Java's synchronized might do such a thing, although I've never used it
you could just mark any object as "must-be-locked"
but shared data synchronization tends to needs locking hierarchies, which are pretty complex. I believe
while message passing is always an interaction with a single mailbox - that's simple
And many of these mechanisms/patterns will arise in the different models too.
A good example of this would be having "a queue + a monitor", the monitor is used to synchronize the queue. In certain models, this has a special name depending on it handled.
In a way, these two families can be seen as duals of one another
The actor model works very well in object orientated languages because of the "object" nature of the "actors". They are opaque to each other actor.
When these "actors" send messages, it is completely asynchronous, which means that no blocking on the receiver end.
So for the "actors", the "mailbox" (the thing holding the "messages", i.e. queue) might need to be potentially loads of messages, so it's a lot harder to reason about.
CSP (a form of Process Calculus, which Go uses a slightly variant of), is fully synchronous in nature.
The message is flipped from being the unit of communication to the channel.
A channel is effectively a message queue with a monitor applied to it.
The channel is the thing that referenced to, where messages can be sent to it or received from.
Most channels will block, and thus synchronous, until a channel receiver reads.
This has advantages of that the blocking mechanism is that the channel only needs to "hold" one message.
Go extends this to allow for buffered channels (of fixed capacity), so that you can have stuff similar to the Actor more, but the channel is the focus, rather than the sender/receiver (i.e. the actor).
So the in Process-Calculus family of models, the processes communicate (and synchronize) through channels, whilst the Actors are themselves the processes.
One interesting thing is that the actor model does have the advantage of being "modelable" across multiple machines e.g. each machine an actor which then communicates over a network.
So because the models are effectively duals of each other, they can be applied at different levels for different problems.
But because these are generalized models, they will not be good for all problems (obviously).
But one thing which seems to be a given across all forms of concurrency is the concept of a queue.
QUEUES EVERYWHERE!
(edited)
well, concurrency through shared memory may disagree
I mean, you can do concurrency through memory without using queues
Of course, but many basic concurrency things will have a queue-like thing to them.
yes, yes, just was pointing out that "all forms of concurrency" is just an expression
but I do think there is a merit in ditching queues to improve overall efficiency, even if it makes everything much harder to reason about
But if you look at non-computing realms, e.g. humans, the queue is the way to organize things.
@NWDD And that's the cost that is to made. The cost between pure performance and the ability to reason about the problem.
maybe some models will come in the future to help mitigate this?
(edited)
I'd assumed that queues don't need to be slow. In fact they can achieve speed-up because they reduce synchronization costs
also, messages don't necessarily imply a lot of copying. You can just enqueue pointers
it's basically ownership transfer
Queues are just a way of organization, and there is not inherently slow about them.
Random distribution into different queues might not be the best choice compared to selective distribution.
e.g. if each message contains information about how much work it is, the scheduler can help decide where to place it.
And that's the next aspect to add onto the queue idea, a scheduler.
I think we really need to distinguish access to global concurrently mutated data structures (databases) from cooperating computation
(edited)
the former might tend to gnarly explicit synchronization, and RCU type data structures
the latter tends to be easily coordinated using queues
1 requester to 1 source == trivial
M requesters to 1 source == shared memory
1 requester to N sources == message
M requesters to N sources == hybrid
what would be an example of M to N
the linux kernel has a recent example to that with asynchronous IO via
io_uring? (I think it was called something like that)
(edited)
yes and no, it's "a queue with pointers"
you have this global data structure which is shared between user and kernel
and it's by writing on it that you communicate and request to avoid having to pay for the switch
so while it's true that it uses a queue to synchronize it also uses a shared memory location
right, in the end everything is a shared memory location
queues are only an abstraction
queues provide serialization of concurrent requests / responses
There is a reason I brought up the "models". Some models are more useful than others in different situations
with M to N, did you allude to e.g. N threads accessing a shared memory search tree of M nodes that can be concurrently modified?
I was meaning:
M threads (requesters/accessors) accessing N sources of shared memory
so what's a source of memory here?
(I am being very vague on purpose)
My point above was probably that there is a tradeoff - we can replace explicit synchronization on shared memory by mediating accesses through a data-structure-managing actor (process)
at the cost of more strict serialization
this is what I understood:
1 requester to 1 source == trivial -> (straight-forward single-thread code)
M requesters to 1 source == shared memory -> (for example: multiple threads accessing a hash-table)
1 requester to N sources == message -> (for example: a thread that receives network packets from multiple sockets and puts information in a hashtable)
M requesters to N sources == hybrid -> (for example: multiple threads that receive network packets from multiple sockets and put information in a hashtable)
What I'm missing is for example, how are those N sources related?
If there are no consistency constraints for the N things, the problem isn't any harder
Well, the last one is technically M->N->1
yes, couldn't think of a practical example without adding ->1
(in the last example I was thinking about a matchmaking server)
If there are no consistency constraints for the N things, the problem isn't any harder
in the M->N->1 example I was thinking of, the reason why M->N is different from 1->N is because 1->N couldn't provide enough throughput
The simple example of M->N would this:
Each (M) thread receives a network packet, does some work on it, and then passes to another server (N)
What's the worst concurrency bug you've all encountered?
Locked up everything on all threads, because I was recursively sending and receiving messages (semaphore) from the same thing.
e.g. all threads were blocked and stuck in an infinite loop at the same time, effectively.
I ask because I know that there's tons of bug potential once you start doing things concurrently. I had a good time a while ago going through this thing: https://deadlockempire.github.io/
Slay dragons, learn
concurrency! Play the cunning Scheduler, exploit flawed
programs and defeat the armies of the Parallel Wizard.
1
Here's a question, is there actually ever a need for a semaphore instead of a condition variable?
(I know they are not equivalent)
For the audience (me), could you define the difference?
What's the worst concurrency bug you've all encountered?
A race condition that happened in a non-parallel situation of a so-called "safe" programming language (C#) in a game that was more or less ECS-based, where a system removed the "rigidbody" component in a very specific (user-triggered) situation.
There was a separate system which worked like a finite-state-machine that was being updated every few frames.
If during a transition between two states the rigidbody got removed, the thread that was running the logic crashed and the C# runtime for the target platform would silently swallow the exception freezing the game (without providing any crash-dump or useful information)
(edited)
1
Here's a question, is there actually ever a need for a semaphore instead of a condition variable?
Have been thinking for a while, but I don't think I've ever felt the need or seen an advantage to using a semaphore
I do recall at some point using them because it was provided in a library example?
So what makes semaphores a poor choice?
because the most recommended scenario for a semaphore is a pool of resources
however when you have a pool of resources then also want a handle to the specific resource you got
which need an additional thread-safe way to handle it
and that thing to dole out the resource handles can handle blocking the thread just as well as the semaphore
I can see them being useful for example to request a system to run once (for example run a physics update)?
But because more often than not you need to know when it has finished and wait for it, there are other options?
Plus it feels like semaphores aren't really well known which means that it may confuse other people working in the project
In the end I feel that because we're used to do so it's easier to reason about critical regions of code that should not execute at the same time and other options are just not as popular?
(edited)
they are still taught as a synchronization primitive though
and for the record:
semaphore is a counter where if you try an decrement below 0 it blocks until another thread increments it above 0
a condition variable is an object that lets you wait on a signal from the thread while temporarily releasing a mutex
yes, people are taught about synchronization in general but I don't think that knowledge persists in time after tests
For those who have worked with parallel programming a fair amount, doing parallel programming properly such that close-to-optimal code is produced given any parallel problem seems extremely difficult to master, and extremely difficult to do in general. How much of this do you attribute to tools? Are there tools you wish you had when learning---or doing---this kind of programming?
What's the worst concurrency bug you've all encountered?
@bvisness I've run into a weird situation that involved some sort of false sharing. I had a writer queue that would normally transfer around 20MB / sec, but in that situation the throughput dropped to about 400KB / sec
forgot to mention explicitly that the bottleneck (400KB/s) here was a memcpy() ;-). When the original 20MB/s was simply the hardware source
@ryanfleury I'm not sure I'm attributing anything on tools. What we've missed out on in the discussion so far is focussing more on what actually happens in the hardware, HM style.
(edited)
In the end we're on highly parallel machines and if there weren't possible benefits, much of the parallelism wouldn't exist
(edited)
adding onto @ryanfleury 's question, are there any resources you'd recommend for getting into parallel programming?
I like the parallel stacks feature of visual studio, but It's not life-changing
It for sure is important to understand the basics, like Caches, MESI, and synchronization primitives
the Linux kernel goes quite fancy with synchronization on shared data structures, so maybe that is a recommend. Though on the other hand, this approach is error-prone, not beautiful from an architectural perspective.
I'm sure that static analyzers, fuzzers, and similar tools can help a lot with debugging shared data structures. So that would be an answer to Ryan's question. I can't say anything about the quality of the existing tools, not really having used any
I feel like I ought to have some tools to recommend, but the truth is that I don't really. usually the challenge when doing something fancy (i.e. more than the straightforward spinlock/mutex) is getting things to work correctly at all, and then the best thing you can do is to stress-test on as many CPUs as you can. so in a way maybe a machine with lots of CPUs is the best way to find bugs..?
With my relatively humble stuff, I found it surprising how quickly some of my concurrency bugs were found by simple stress testing @Vegard
though that is somewhat of a tautology, it doesn't take a lot of testing until you hit no bugs anymore (i.e. just fewer ;-])
it can be really tough to hit some bugs, so I don't advocate testing as a way to validate your design
for sanity checking or bug finding it's okay
@ryanfleury on a sidenote, both the current topic and the next few could really use some resource recommendations listed on the handmade network resource page afterwards
I'm back, female canines!
3
so what
️ stuff do you have for us
The Actor Model is only popular because because it's the OOP of concurrency models and fits very with in OOP languages.
Interesting. Didn't you say earlier that it was possibly a good fit for networked applications?
I mean I guess that wouldn't be related to popularity, per se
I didn't say good, I said it scaled easily to networked applications.
Because each machine is treated as its own actor in the network/graph
Would you say that CSP is basically the actor model too, or are there some differences?
CSP is the anti-actor model.
Ok that's surprising to me, because both models have separate stateful "processes" that communicate with each other, don't they?
Think about the focus of each model
I'm not sure I see the distinction you're going for. Are you saying the important part of CSP is the communication, and the important part of the actor model is the actors (e.g. processes)?
Or is my "e.g. processes" the issue here
I'm not really getting the distinction either. Also not all of CSP is channel focused or communication focused. If I recall the first half of the book doesn't even mention channels. I recall CSP more as an algebra over sequences of events
but for the channel part, not sure what is that relevant or insightful. Would very much like to know
I gather that CSP channels are understood as sending messages synchronously (i.e. the sender waits until recipient "dequeues" it), as opposed to asynchronous sending with Actor model, but I'm also not sure that this is super relevant, nor would I actually want to waste the time waiting
well I'm also not sure that the actor model specifies synchronous vs. asynchronous communication?
There are in any case, different variants of all the theories - for example I believe CSP started out as a static graph of processes connected by channels, but it might be that this constraint was lifted in later publications (don't quote me, I haven't really checked back)
I know that actor model is definitely always understood as sending completely asynchronously @bvisness . At least in these days
but if I'm thinking of modelling my program after the actor model, I'll probably consider allocating messages statically in the sender, so allocation isn't a performance bottleneck, and the number of existing messages is naturally bounded
that would also mean each sender has to wait until it gets some of its messages returned back before it can send more, which would again lead to some form of "blocking". Not really while sending but still
that makes sense, and would lead to a model like I'm used to from Go
(what you describe is basically the behavior of a buffered channel)
I actually very rarely see any of Go's concurrency features used heavily outside the standard library, and I'm not sure why. They certainly feel well-integrated into the language, but nobody seems to use them.
Maybe it's just laziness or people not knowing how to use them.
more column b than column a IMO
then again go is targeted to backend webdev
I certainly do agree that the actor model / CSP is a good conceptual fit for an object-oriented language (I continue to think they are essentially the same)
I guess I'm curious, though, if it's an essentially object-oriented concept. Concurrent execution seems to me to imply that you have concurrent "processes", which will certainly have their own state. Whether that's an "object" vs. a coroutine, goroutine, thread, etc. seems unrelated.
After all, you can apply actor model or CSP designs inside a program, or across network boundaries, and the concepts feel the same to me
If you extend CSP to allow for infinite capacity channels, then they are dualistic models.
Both are modelling interactions of a graph mesaages.
Actors = Nodes/Vertices
Channels = Edges
So does the actor model not say anything about synchronization? I wasn't under the impression that e.g. a smalltalk system would make the communication between two objects asynchronous.
But as soon as make the channels fixed capacity, it becomes synchronized (to a limit)
The actor model is pure asynchronous
So that's why I said they were dual models.
I'll say that one should strive the number of messages in flight bounded in any program
Not sure that it's particularly relevant if it's the channel that caps the number of messages, or if it's the senders
(edited)
Another fun fact, CSP is effectively UNIX Pipes applied to the program level
1
Pasting some messages from #fishbowl-audience about the differences between CSP and the actor model...
Some definitions from the internet:
An actor is a computational entity that, in response to a message it receives, can concurrently:
send a finite number of messages to other actors;
create a finite number of new actors;
designate the behavior to be used for the next message it receives.
Recipients of messages are identified by address, sometimes called "mailing address". Thus an actor can only communicate with actors whose addresses it has. It can obtain those from a message it receives, or if the address is for an actor it has itself created.
The actor model is characterized by inherent concurrency of computation within and among actors, dynamic creation of actors, inclusion of actor addresses in messages, and interaction only through direct asynchronous message passing with no restriction on message arrival order.
@gingerBill that is true for TCP as well, or for file I/O, or for... any communication with flow control really
I mean specific pipes because of the ability to select
I mean specific pipes because of the ability to select
select is just demultiplexing
you have a number of possible inputs (from concurrent connected actors) and want to be able to the next one that signals anything
select is needed as a system call because the OS encapsulates I/O
but in general, demultiplexing is implemented using synchronization primitives
or in hardware, using interrupts or regular polling
I know it's demultiplexing, but I meant that CSP was effectively the formalization of UNIX's pipes.
(edited)
jfs's question:
But what is the essential difference between sending a message "via a channel" or "to an address"?
My overall take
I guess it's true that in a CSP system you don't directly reference the other actors. But I'm still not convinced that meaningfully changes the model
you could conceive of channels as "actors" that hand messages back and forth, and now you've achieved basically the same thing?
it doesn't feel to me like a fundamental difference
and bill:
There's a real I keep saying dual models.
They are effectively opposites for measuring graphs of messages
k continue
So @gingerBill are you saying the difference is that in CSP you control the edges, while in the actor model you control the vertices? Or something like that?
ok, I see what you mean by "dual" then
I guess I still don't think they are all that different in practice, since channels feel in that context like extra restrictions on actor communication.
My idea is that in most realistic programs you would just use a mixture of both, as you see fit
it only is a meaningful difference if you imagine actors being real physical threads (or at least OS threads)
(edited)
and channels really being a certain standard implementation with a fixed comm interface
(edited)
I also happen to think that it is extremely important to use standard interfaces for concurrently modelled programs
from my experience setting up all these graphs and actors and channels and what not, and then using the comm ifaces right, is extremely painful, so one wants to maintain and use as few primitives as possible
(edited)
So where are actor models used in practice for concurrency? (Like, what languages or tools are used for this)
apparently Go is not an example, by @gingerBill 's definition
I'm just not very aware of what's in the space
Maybe we should just wrap it up anyway. Last call for questions?
honestly it just sounds like nit picking to distinguish between them, you can turn the actor model into CSP by making a bunch of channel actors who forward messages to everyone who subscribed
Scala's Akka is the most famous actor implementation that I'm aware of. Erlang is a popular and successful language that implements the actor model. Never really used any of these
(edited)
Alright closing this off now. Thanks everyone for participating!