What is "optimization"? When is it "premature"?

Hi all,

To better understand optimization in programming, I'd like to ask some general questions:

  • What does it actually mean to optimize a program?

  • How do you know when it is time to make optimizations?

  • What does the process of optimization look like?

  • I often see people repeating Knuth's "premature optimization is the root of all evil" quote. Would you agree that this quote is usually taken out of context, or possibly even used completely wrong? What does the quote actually refer to?

  • For a specific example of the above: I often allocate a large chunk of memory up front, then use that chunk as a basic arena for further "allocations". (This is very similar to what Casey does in Handmade Hero.) From the start of a project, I care about how memory is used, how things are put into memory, and how the CPU accesses memory.

    A friend of mine tells me that I don't need care about things like this until much later, if at all. He then inevitably repeats the "premature optimization" quote and tells me things like "just use 'malloc' or 'new' everywhere because it's easier". How would you respond to this? What draws the line between caring about things like memory upfront and "premature optimization"?

Thanks!

Edited by Jay Waggle on Reason: grammar
It's funny because the premature optimization quote is part of a larger statement that actually explains when it's not premature: (and criticizes overuse of a dogma like that)

Knuth
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."


In other words it's not premature when you have investigated and found that the code you are optimizing is actually code that is critical for performance.

Also it's very likely that after you went and structured your code with malloc/free that a stack-based arena becomes very hard to slot in. Having a general allocator that is good enough is very hard to replace with a non-general allocator that is optimal for certain styles of code.

Edited by ratchetfreak on
ratchetfreak
In other words it's not premature when you have investigated and found that the code you are optimizing is actually code that is critical for performance.

Knuth is 100% correct, and this advice is what all novice programmers should heed.

However, there is some advanced knowledge which should only be practiced by the true masters of the field: The best and most effective optimisation is done first, before a single line of code is written, on the back of an envelope.
ZedZull
  • What does the process of optimization look like?

  • I haven't got to the stage of optimising C code yet, but did get a taster of optimising stuff while playing games like SpaceChem, TIS-100 and Factorio. In TIS-100 in particular – which is the closest of the three to programming in C – you can optimise for different things: cycle count, instruction count and node count. Of these, the cycle count is the one that's probably the closest to a type of optimisation we can do in C.

    One of the ways to speed up the solutions in TIS-100 is to unroll loops: so instead of writing a small set of instructions and then a jump instruction to the start of the set to make it repeat, you just write out that set of instructions as many times as will fit in the node before ending it with the jump. It's a super simple optimisation, but does shave off precious cycles and I think it's something that can be effective in C (or maybe only asm, I'm not sure). Another way to optimise for speed in TIS-100 is to parallelise your stuff, using multiple nodes to do the same thing, which I think is roughly similar to doing SIMD operations in C.

    As well as optimising for speed in C, though, I believe we can optimise for things like CPU and memory usage. However, I'm super out of my depths here, so it's probably best left to someone better equipped – Jeff, dude‽ – to describe what optimisation actually looks like.
    I find internet arguments about it tend to get tautological.

    That is, any early optimization that is not bad is not optimization, it is just proper design.
    Early optimizations are only those things, that when done early, are bad.

    Honestly I think most of the people you see talking about it on twitter/reddit/university etc are just parroting things they have heard. I'm not sure how many have actually been on projects that got derailed because someone threw a load of hand tuned assembler in a project that nobody could debug. I have never been on a project derailed by any kind of early optimization, but maybe that is because my day job is in the web-dev universe and nobody optimizes there at all! Perhaps in game dev world this happens more often because people are excited about writing fast code there.

    Most of the optimizing I do in the web dev world is just deleting code or fixing database design.

    In the 'Unity3D making a game for fun world', it is running into things that make the frame rate hitch, and fixing it via what people usually imagine optimization is -> tight C code for things that need to be fast, unsafe blocks in C#, bit twiddling, SIMD etc.

    In both cases though, I continually learn the lesson that you need to profile your code. When you see something is slow, you will often think you know what is causing it, and spend time messing up some code, only to find out later that wasn't the problem. In one of the games I worked on I was tweaking tons of code around generating solar systems to speed it up, but the whole time it was just the random number generator taking 99% of the time. I wasted time by not profiling it in the first case. Super easy fix once I saw it as opposed to the exotic things I was starting to attempt.

    On a recent web dev project, I profiled it and found that the function taking up the most CPU Time was an innocent line of code that takes a discriminated union (similar to an enum) and turns it into a string via reflection. Never would have guessed, given the millions of different things the server does, some of them quite complex. But it was a function that got called as part of caching database requests, so it got called a lot, and reflection is slow. Again, super easy fix! By this and other tweaks I made from profiling the web app we were able to pull ~10ms out of the average page load time AND use a cheaper cloud service to host it, saving us money.


    So I suppose you could say that a sufficient condition for premature optimization is:
    If you are optimizing code already written, before proving that it is a performance problem via careful measurement, then you are doing premature optimization. Though it may not be the root of all evil, it does waste time, and time is precious.



    Edited by Jack Mott on
    Here's a quote from the start of Michael Abrash's graphics programming black book. The first chapter more or less asks and answers all the questions in the OP, too.

    Understanding High Performance

    Before we can create high-performance code, we must understand what high performance is. The objective (not always attained) in creating high-performance software is to make the software able to carry out its appointed tasks so rapidly that it responds instantaneously, as far as the user is concerned. In other words, high-performance code should ideally run so fast that any further improvement in the code would be pointless.

    Notice that the above definition most emphatically does not say anything about making the software as fast as possible. It also does not say anything about using assembly language, or an optimizing compiler, or, for that matter, a compiler at all. It also doesn’t say anything about how the code was designed and written. What it does say is that high-performance code shouldn’t get in the user’s way—and that’s all.

    That’s an important distinction, because all too many programmers think that assembly language, or the right compiler, or a particular high-level language, or a certain design approach is the answer to creating high-performance code. They’re not, any more than choosing a certain set of tools is the key to building a house. You do indeed need tools to build a house, but any of many sets of tools will do. You also need a blueprint, an understanding of everything that goes into a house, and the ability to use the tools.

    Likewise, high-performance programming requires a clear understanding of the purpose of the software being built, an overall program design, algorithms for implementing particular tasks, an understanding of what the computer can do and of what all relevant software is doing—and solid programming skills, preferably using an optimizing compiler or assembly language. The optimization at the end is just the finishing touch, however.

    Without good design, good algorithms, and complete understanding of the program’s operation, your carefully optimized code will amount to one of mankind’s least fruitful creations—a fast slow program.

    “What’s a fast slow program?” you ask. That’s a good question, and a brief (true) story is perhaps the best answer.

    When Fast Isn’t Fast

    In the early 1970s, as the first hand-held calculators were hitting the market, I knew a fellow named Irwin. He was a good student, and was planning to be an engineer. Being an engineer back then meant knowing how to use a slide rule, and Irwin could jockey a slipstick with the best of them. In fact, he was so good that he challenged a fellow with a calculator to a duel—and won, becoming a local legend in the process.

    When you get right down to it, though, Irwin was spitting into the wind. In a few short years his hard-earned slipstick skills would be worthless, and the entire discipline would be essentially wiped from the face of the earth. What’s more, anyone with half a brain could see that changeover coming. Irwin had basically wasted the considerable effort and time he had spent optimizing his soon-to-be-obsolete skills.

    What does all this have to do with programming? Plenty. When you spend time optimizing poorly-designed assembly code, or when you count on an optimizing compiler to make your code fast, you’re wasting the optimization, much as Irwin did. Particularly in assembly, you’ll find that without proper up-front design and everything else that goes into high-performance design, you’ll waste considerable effort and time on making an inherently slow program as fast as possible—which is still slow—when you could easily have improved performance a great deal more with just a little thought. As we’ll see, handcrafted assembly language and optimizing compilers matter, but less than you might think, in the grand scheme of things—and they scarcely matter at all unless they’re used in the context of a good design and a thorough understanding of both the task at hand and the PC.

    You can find the rest of the book here.

    But my experience of optimisation more or less matches mandlebro's--find the one innocent looking line of code that calls some library or runtime feature in a dreadful way and yank it out.

    Edited by graeme on
    I thought I'd add my 2 cents, considering I joined this forum to discuss these things specifically. I consider myself still very green ;)

    My experienced has been, and is still, that colleagues who mention "early optimization" are usually lazy (to a fault). They use it as an excuse to write atrociously slow code, or simply do bad work. An example is a project that has been in "optimization" stage for 1 year now. The decisions made at the beginning (or lack of decision-making) have ultimate ramifications throughout the project, which will never run fast enough. All hopes have been abandoned.

    I personally ask myself, "Is this going to affect future refactoring/redesign/re-architecture?" and "Does this performance thing only apply to one platform?". If I've answered no to both those question, I do not consider it early optimization. This is especially important early on, while you are still experimenting/prototyping how your final system will work.

    Btw, hello everyone :) I'm enjoying this forum quite a lot, and it has given me hope that I am not alone in my weird ways of thinking.
    Thank you a TON!

    Edited by socapex on
    These are really interesting questions, and I agree with a lot of what has been said already.

    I think one of the biggest problems I come across with this is that people often seem to misinterpret the premature optimisation quote, along with the 90-10/80-20 rule, to mean that thinking about performance is pointless until you've written your program and can profile it to find out where the "hotspots" are that need optimisation. The problem with this approach is that you can end up in a situation where you're nearing the end of a project and your game is running too slowly. Therefore you profile, find the worst hotspot, optimise, and find your program is much faster but still too slow. The problem that arises is that after you've repeated these steps a few times, you quickly reach the point where you've actually dealt with all the hotspots and there are now no more easy performance gains to be had. However the overall program is still too slow. This "death by papercuts" is incredibly hard to solve, because there are no more really slow parts of your program to optimise. Instead, you are just bleeding performance from a huge number of suboptimal decisions, and at a late stage in a project, it's just not viable to rewrite all of code in an optimal way.

    I think the lesson here is that the quote should never be taken to mean that you don't have to think about performance. What it means is that you shouldn't go out of your way to write complex and implementation specific optimisations at an early stage, because you may end up throwing your work away or putting a lot of effort into something that turns out not to matter much. Just being performance minded will never remove the need for a profiling and optimisation stage. There will always be surprise bottlenecks and the like. It's just that once you've dealt with your hotspots, you need to be left with a body of reasonable, if not perfectly optimal code, rather than a quagmire of performance bleed.