Really interesting blog post. I definitely find that the "most obvious thing" approach starts to quickly break down in higher level languages. You do a really good job in the article of highlighting how languages can conceal both the algorithmic complexity (particularly bad in functional languages as shown in your F# examples) and implementation level efficiency (as demonstrated with your SIMD examples).
At the risk of being pedantic, I think it's worth considering the difference between "obvious" and "idiomatic" code. For example, in C++, the idiomatic way of passing an object into a function (by const ref) is completely reasonable from an efficiency standpoint.
| void my_function(std::string const &s) {
// ...
}
|
However this could not be considered to be the most obvious code, which would probably be something like this instead.
| void my_function(std::string s) {
// ...
}
|
This of course introduces an unnecessary deep copy. In this sense, the idiomatic C++ code has become idiomatic as a way of counteracting the inherent inefficiency of the obvious code.
A similar problem manifests in Rust. As an example imagine we're implemented a Mat4x4 struct. To multiply two Mat4x4s (m1 and m2), the most obvious code to write would probably be something like this.
This passes m1 and m2 by reference to the multiplication function, but it actually transfers ownership, meaning that m1 and m2 can no longer be used. If this isn't the desired behaviour, we could make our Mat4x4 copyable, which would solve that problem but would make our above code copy 32 floats into our multiplication function, which is almost certainly less efficient than passing by reference. To solve this, we could let the multiplication function borrow m1 and m2, but that would require code like this.
| let m: Mat4x4 = &m1 * &m2;
|
To me, this code is less obvious than the previous version, but of course is more efficient.
To return to C/C++, the same problem of excess copies is manifest in the obvious implementation of various mathematical operators. For example, consider the typical implementation of operator+(), which would return a new instance of an object (idiomatic C++ is to "do as the ints do", and operator+ on an int returns a new instance). This means that in long and complex equations, lots of temporary objects are being created, copied, and destroyed when the memory could be reused. For example this code
| std::string s = s1 + s2 + s3;
|
is the equivalent of
| std::string temp1 = s1 + s2;
std::string s = temp1 + s3;
|
This problem has led to the unspeakable horror that is C++ expression templates. Whist these do indeed "solve" this efficiency problem, they are about as far as it's possible to be from obvious, and are so complex that they are not even considered idiomatic within the C++ community, which has become fairly tolerant of high-complexity solutions.
Of course, an added complexity in all of this is the fact that when we're discussing higher-level languages, it's very hard to separate issues of inherent inefficiency from specific implementations. As you highlighted in your section on SIMD, you're rather at the mercy of your (JIT) compiler. As you made clear, there is no reason C# couldn't apply automatic vectorisation, and there is nothing requiring C++ to do so. This leads to the interesting situation where instead of application programmers being able to find the most efficient solution, you have implementers examining idiomatic code and ensuring their implementations optimize for that usage. A classic example of this would be tail call optimizations in functional languages.
As for examples of where languages can lure you down the wrong path, I had a fun experience with Eiffel once. In Eiffel, the preferred way of managing concurrency is through a methodology/technology called SCOOP. Without getting into too much detail, the basic idea is that each object explicitly marked as "separate" runs in its own virtual thread and communicates via messages, much like processes in Erlang. However unlike in Erlang, it turns out that the EiffelStudio implementation spins up an OS level thread for every object you mark as separate. You can imagine what happened when I instantiated a few hundred of these things without realising... I think that's an example of somewhere where the idiomatic way of programming was, if not inherently inefficient, then inefficient for a large proportion of cases, and in fact required a good understanding of the implementation to use. Leaky abstractions strike again!