Throw more hardware at it

Hello all,

I have heard from colleagues and others that today we don't care about writing clean efficient code that is low on memory because we can just throw more hardware at it to make it run reasonably. I personally really dislike being told this. I agree that hardware is becoming cheaper and faster each year, but surely this should mean we should be able to accomplish more with it, have better graphics and not that perhaps now Java can run on it (really slowly). It really does feel like it is the wrong approach to a problem to just throw more hardware at it. Is this really the only solution? Or should we step back and perhaps write some better code? It does feel to me that somewhere the hardware started to solve the software issues and we aren't fully leveraging what is available.
I think for developers this is true. If something runs slowly (compiler, not enough memory, out of disk space) then just throw more hardware at it. Its cheaper than your time.

But if you want to ship your application or game to end-users, then its all bout what do they have it. If you want to support larger market, then its only up to you to make sure your app/game runs smoothly on their hardware. Demanding that they upgrade will not solve anything. They simply won't use it.
Well yes I can agree for developers it makes sense. But you get game makers or app builders these days, and the apps they make are bulky and clunky. Some web pages in use are terrible. Take sharepoint or outlook online for example. It's almost 17mb download to just load the page. And then it can still react slowly at times. So I guess it comes down to developer time vs. hardware cost?
There's a couple ways to approach counterarguments for this. One which might resonate well with people in the tech industry is an economic argument. Something like this:

In general, you want to be as efficient with your resources (money) as possible. Hardware costs money. Programmer hours also cost money. Dissatisfied users may cost the most money.

Striking a balance in this requires having accurate and reliable estimates for about how much each of these costs to your organization (be it a company, academic research lab, free software group, or what-have-you).

If you are spending a lot on programmer salaries (relative to the benefit gained in lines-of-code in product) relative to hardware costs, because they have to spend a lot of time waiting for their code to compile or fighting sluggish servers or trying to optimize code past the point of diminishing returns, then perhaps it is worthwhile to invest a bit more in hardware, if you will save money on your programmers' salaries. Of course, maybe you're getting an even bigger savings in the end, because you end up with a better product that runs better on more peoples' computers -- again, you need to have fairly good estimates for how much each axis costs you.

If you have given your programmers lots of expensive hardware, beefy laptops and heavy servers, and the code quality is bad enough that your product is sluggish on users' computers or difficult to scale (since you've already invested heavily in servers for a small deployment) -- the cost to you from the hardware and/or user complaints, lost sales, etc may outweigh the cost of having your programmers work for longer on the product and deliver something better than runs on lighter machines.

This is just one approach to arguing the core point. Here's a more philosophical one.

When we write software, we're usually trying to enable people to do something they couldn't do before, and make that available to the largest number of people possible (as economically feasible -- we can't go on and make more things if we don't recoup some cost from it). By having your software require more resources, fewer people can run it. This wasn't always a big issue, since you could be assured that in a couple years, hardware would have improved and more people would be able to run your software. However, that's leveling out, and is no longer always reliable.

Personally, I fall into the camp of wanting to write, as you say, "clean, efficient code", because I like being assured that I know at a fairly granular level what my code is doing on the machine, using the smallest amount of computational resources possible with a reasonable amount of flexibility, and finding local optimums for these constraints. But I'm a bit of a perfectionist, this line of reasoning may not work for everyone.

Finally, it's worth mentioning that different aspects of computer hardware are improving at different rates.

CPU clock speed stopped increasing around 15 years ago, and has been level or even slightly declining for power/heat reasons.

# of cores per chip is increasing, but adding more cores has diminishing returns for individual pieces of software / algorithms.

Amount of RAM is increasing, but the cost of accessing RAM is increasing relative to processor clock speed.

GPU performance is increasing, but this only helps for heavily parallel tasks (like machine learning) and graphics -- it won't boost your Java VM performance.

Disk sizes are increasing, although the average person's disk size is hovering right now as everyone switches over to SSDs. Most consumer grade laptops have either 1TB HDDs or 120GB SSDs -- quite a range.

Disk access speeds are increasing as people switch over to SSDs, but lifetime is decreasing because of limitations on the number of writes an SSD can perform.

Network speeds are increasing some places, and stagnating others, depending on your ISP and competition in the area.

Personal computers are gradually giving way in popularity to mobile devices -- these make almost all of the above factors temporarily decrease if you look at averages. It also changes the way people use devices, and the kind of overhead you're looking at for simply performing basic system tasks. All android apps have to run in/alongside the Java VM in the end, for example.

So depending on your problem, it may not actually be possible to throw hardware at it indefinitely. Especially CPU bound single-threaded tasks are really not scaleable any more -- pretty much your only option is to optimize, and you can only do that if you actually have some idea what your hardware is doing. That's part of why we established Handmade Network -- to make an explicit statement that these things matter, and will continue to matter, and to provide resources for people to learn about the way computers work today so that they can write the better, cleaner code without actually investing much more time or effort in their future software efforts.
ChronalDragon
Amount of RAM is increasing, but the cost of accessing RAM is increasing relative to processor clock speed.

I don't think this is true. RAM has never been faster on average before than it is today.


What changed is multiple execution units and deep out of order pipelines on CPU. Before CPU was slow, so accessing memory was "faster". Nowadays CPU can do a lot more calculations in same time that before, so accessing memory now is "slow".

Edited by Mārtiņš Možeiko on
Tazmain
Well yes I can agree for developers it makes sense. But you get game makers or app builders these days, and the apps they make are bulky and clunky. Some web pages in use are terrible. Take sharepoint or outlook online for example. It's almost 17mb download to just load the page. And then it can still react slowly at times. So I guess it comes down to developer time vs. hardware cost?


Speaking of game makers, the amount of slow games that I come across on PS4 is depressing. The developer knows *exactly* what the hardware will be for a PS4 and they still don't make it work the way it should. Shameful!
mmozeiko
ChronalDragon
Amount of RAM is increasing, but the cost of accessing RAM is increasing relative to processor clock speed.

I don't think this is true. RAM has never been faster on average before than it is today.


What changed is multiple execution units and deep out of order pipelines on CPU. Before CPU was slow, so accessing memory was "faster". Nowadays CPU can do a lot more calculations in same time that before, so accessing memory now is "slow".


Right, notice I said *relative to processor clock speed*. So you are waiting around more *cycles* for memory fetches to increase, unless you can take advantage of cache properties.
Thanks for the insight. I do feel that a lot of programmers lose touch with how the hardware works. Or they simply aren't bothered to learn it because they are using a "higher" level language. I am trying to work my way down and understand more of what is happening. One thing that amazes me is how the guys at cppcon know roughly how many cpu cycles they waste. I still want to find out how do measure that. So with the # of cores increasing and only AMD that is increasing clock speed, should we perhaps get different algorithms that can run on multiple cores? I know some things really are single threaded only. There is no way getting around it. It seems that quad cores are the sweet spot as well for mobile phones. In the mobile industry you want to get something done as soon as you can so the CPU can go back to sleep.

So does the CPU wait longer now to access memory? From the true latency that is how I understand it. So you ideally want things to be close together so that you don't have to go to the memory often but rather what you need for the current task in already in the CPU cache?
Tazmain
I do feel that a lot of programmers lose touch with how the hardware works. Or they simply aren't bothered to learn it because they are using a "higher" level language. I am trying to work my way down and understand more of what is happening.

You might find my recent podcast with Andrew valuable, I hope: https://soundcloud.com/user-63451...ation-ep-1-handmade-dev-show-2017
On the show I dive into Handmade's current efforts to address your very concerns. Cheers.
Tazmain
Thanks for the insight. I do feel that a lot of programmers lose touch with how the hardware works. Or they simply aren't bothered to learn it because they are using a "higher" level language. I am trying to work my way down and understand more of what is happening. One thing that amazes me is how the guys at cppcon know roughly how many cpu cycles they waste. I still want to find out how do measure that. So with the # of cores increasing and only AMD that is increasing clock speed, should we perhaps get different algorithms that can run on multiple cores? I know some things really are single threaded only. There is no way getting around it. It seems that quad cores are the sweet spot as well for mobile phones. In the mobile industry you want to get something done as soon as you can so the CPU can go back to sleep.

So does the CPU wait longer now to access memory? From the true latency that is how I understand it. So you ideally want things to be close together so that you don't have to go to the memory often but rather what you need for the current task in already in the CPU cache?


most of the cycle counting comes from ballparking and doing !!science!! on micro optimizations and actual insider knowledge of the cpus they are working with.

for example going to main memory takes about 200 cycles, having the memory in cache reduces it by order of magnitude.

But then instruction level parallelism kicks in and instructions that don't depend on the data from memory can still execute while the memory fetch is in progress. This includes other memory fetches. So getting things truly optimal with that many moving parts is very tricky.

One argument that can apply in server scenarios, is that spending some extra time on the code, once, can save you a ton of work in system administration and deployment later.

Imagine for instance a web site. Even using languages like Java or C#, if you have a sound architecture and tight code, you should be able to serve a *huge* volume of traffic with a *single* cheap webserver, and a *single* cheap database.

If you are sloppy with the code and need to have more than 1 webserver, now your deployment process has to be able to deploy to both in an intelligent way. You also need a load balancer to split your traffic across the web servers. Once you need 2 databases due to inefficient queries or design, they have to be kept synchronized, which is difficult and adds complexity that has to be dealt with, forever, as well as extra cost.

And remind people that writing efficient code doesn't normally mean worrying about counting cycles or writing in C or assembly, but just getting the clowns out of the car. Understand your language and its core libraries. Don't use trivially convenient abstractions if they cause allocations. (like Linq in c#). Avoid features that are incredibly slow (reflection and string concatenation in most languages) Profile the code, fix the low hanging fruit. Give tight loops extra thought. Think about caching things rather than recomputing or re-requesting them from a database. Use arrays or array backed list collections instead of linked lists or trees when possible.