Negative cost abstractions

oddisyes

#23786

January 1, 2021

I've had my side-eye on this community for a while because you guys seem to have a lot of the same problems with modern software development that I, and people I hang out with have: massive code bases for software that terrible speed and terrible correctness. The difference is I'm not coming from embedded/systems/game development, but instead functional programming. I think that there's this notion that functional programmers are the epitome of ivory tower abstraction BS, don't care about performance, and are can't sort in place. While I'm sure that's true for some of us, that is not all of us.

Anyway the main point:
there is a fantastic talk that I'm sure many of you have seen called There Are No Zero-cost Abstractions. Here Carruth demonstrates the biggest drawback that most abstractions: they leak. Most people talk about the problem of leaky abstractions from the perspective of the fact that they obscure implementation details and therefore cause mistakes, and while that is certainly one problem with abstraction leaks, the other one is that because they provide the compiler with less information at compile time the compiler is not able to perform the same aggressive optimizations that it can perform with its handwritten counterpart. While this is certainly an issue for some kinds of abstractions, many of the abstractions that functional programmers are interested in don't remove compile-time information but add it allowing for aggressive optimizations that aren't possible in analogous C code. Still dubious of my so-called "negative cost abstractions," it turns out C already has some of these backed in. First of all: Types. Types aren't just a way of ensuring correctness, but also allow for compiler optimizations. A language which one can find documented instances of it outperforming C due to having more compile-time information provided is Zig. In Jonathan Blow's language being able to write code and have it switch between working with AOS to SOA representations just by changing some type, signatures is another form of these kinds of "negative cost abstractions." One big place where functional programming research in my opinion needs to be folded into mainstream high-performance languages is type-level enforced purity. Purity isn't just nice from a user perspective but also allows for aggressive optimizations that aren't possible otherwise.Futhark is a great example of the work being done in the functional programming world to leverage these types of optimizations. Another opportunity for optimization from functional programming that has started to make its way into languages like rust is "pipe style programming" where instead of calling a bunch of functions sequentially you construct some chain of operations that the compiler can optimize more aggressively. this article talk about that briefly. Generally, I like to divide abstractions into 2 categories: ones that obscure information and ones that add information. The latter category not only is often not only free but faster. This is also before we get to parallelism, but that deserves a post of its own.

Unfortunately, a lot of this stuff is far away from being usable in mainstream performance-critical contexts, so as of today C and Zig are still probably better than their functional counterparts as of now, but I think there should be a greater alliance between our two disparate factions of computer science to improve software quality.

quick addendum:
People here are some of the few that understand the flaws of RAII. Personally, I'm not satisfied with settling with unsafety, I want to have my cake and eat it too. What I see as the future is linear types. If we can have initialized memory and uninitialized memory as separate types where one gets consumed on initialization that is ideal. ATS has this, Idris has this, and Haskell is adding this, but Haskell and Idris are not particularly high-performance languages (although in my opinion, Idris has the potential to be if more effort is put into it), and no one uses ATS.

Edited by oddisyes on January 1, 2021, 9:20pm

Miles

#23787

January 2, 2021

One thing I will note about the SoA/AoS case (other than that, as far as I know, it is actually no longer part of the type system and is now achieved through metaprogramming instead) is that it doesn't give the optimizer any extra information - rather, it gives the programmer more control over the generated code (which is already applied before it reaches the optimizer). The "pipe programming" example you give is similar - the point is that you have control over how operations on the data are grouped into different passes. It's not about giving the optimizer more information.

oddisyes

#23788

January 2, 2021

The AOS/SOA metaprogramming is an example of what I am talking about. You are providing the information that this code will run faster with SOA. The pipe operation also gives the compiler extra information. By programming in a "pipe style" instead of performing a series of function calls you limit the way you can interact with the data. This means that the compiler knows that it can perform certain operations without breaking the semantics of your program. Basically, compilers have to be conservative unless you give them some sort of guarantee that certain optimizations will not break the semantics of your code. By limiting your capabilities when those capabilities are unnecessary you allow for further optimization. This is how Rust can sometimes beat C in contrived microbenchmarks. This is also why languages that expand the capabilities of normal functions (for example with exceptions) end up paying for those features even when they don't use them.

One more thing that might interest you guys although I know GCs are considered sacrilege around these parts is this pauseless GC that a guy I know is working on that leverages extra type information. (The actual codebase can be found on his account).

Edited by oddisyes on January 2, 2021, 3:33am

Miles

#23789

January 2, 2021

I think you are misunderstanding how these features are implemented by the compiler. In the SoA/AoS case, for example, the compiler runs a metaprogram which translates the structure of a particular array, and any accesses to that array, into a different form. However this happens before codegen, meaning that the optimizer has no idea whatsoever that this was the result of a metaprogram. It gets no extra information. As far as it can tell, the results of a refactor via metaprogramming and the results of an equivalent refactor done by hand are indistinguishable. The SoA-enabling language features don't help the optimizer at all, they are purely a convenience for the programmer to be able to quickly and easily carry out a refactor which would be very tedious and error-prone to do by hand. The reason it is good for performance is because making performance-oriented refactors a lot easier will make programmers do them a lot more often.

The pipe programming style features you mentioned (or at least, any sane implementation of them) are similar. The optimizer doesn't know about them, they are just a way to make a common refactor - combining and splitting loops - more convenient in some cases.