The 2024 Wheel Reinvention Jam just concluded. See the results.

Why is the DOM so slow?

I don't understand why the DOM is so slow. It's well-known that it is slow, of course - this is why React has its "shadow DOM" and does diffs before performing real DOM operations. But - why? Why should it be so slow to add elements and so slow to access them? Why couldn't React do its diffs against the real DOM? etc.

I made a little benchmark to test simple DOM inserts. It demonstrates a few interesting things, I guess...first, no matter what I do in my tests, getting data into the DOM (not doing a layout and render!) is about 3x slower overall than pushing into a JS array (which I take as the control). Is that reasonable overhead? I'm not really convinced it is for just the data manipulation part of it. And in Firefox, performing updates on the actual DOM instead of using DocumentFragment or innerHTML is dramatically slower.

Firefox results on my PC:

image.png

And Chrome results as well. Chrome seems to have optimized runs of inserts into the actual DOM:

image.png

I would like to know what DOM implementations are actually doing during simple data manipulation. Where is that overhead over plain JS coming from? Closing that gap would make a big difference, and could help us avoid pitfalls in future retained-mode UI systems.


Edited by Ben Visness on

It's probably not easy to do but isn't the source code of firefox and chrome available ?

Bruce Dawson's blog sometimes talks about performance issues in chrome, maybe you could try to contact them to find where to start (I believe they work at google on chrome) ?

https://twitter.com/BruceDawson0xB/

One word: cruft.

The DOM evolved in the (early?) 90s as a structure+API for static HTML page layout, with JS dynamic modification as an afterthought, and it never got a proper overhaul before becoming a de facto standard. I think the API functions have to update a bunch of variables on the C++ side to keep everything consistent; it's basically spaghetti. Someone ought to develop a simple replacement for the WWW so we can nuke it... it's just terrible.

I have a neat little JS GUI in my game that I'm interested in spinning off. While I can't say it's efficient as of now, it has the bare essentials of CSS and HTML in pure JS (with a C render backend) totalling ~3 kloc. Could be useful for making lightweight native applications with many of the advantages of browser/electron apps.

btw, I'm new here - is there some place I should say hello?


Edited by synthnostate on

I don't know if this helps or not, but I remember trying to figure out why the DOM was slow myself when trying to optimize a "big" list of items (~1k+) in React and I wound up having to implement a VirtualList. But one of the things I discovered is you can create an HTML page of ~5000 empty divs and load that page directly and it opens pretty much instantly.

Example:

// Save clipboard as test.html
copy('<div>Hello</div>' + Array.from({ length: 5000 }).map((it) => '<div></div>').join('\n') + '<div>World</div>');

NOTE: that on my machine, if I try to "preallocate" ~10k divs my browser page literally hangs for quite some time.

Now that I think of it I can't remember if document.createElement('div') or .appendChild() was actually the slow thing in the DOM API.

So one possible workaround is to load a page with a bunch of "preallocated" divs. I don't know if the browser will start to slow down again once you try to actually use them though.


Edited by nickav on

Hello,

I tried to find the code for innerHTML and appendChild in Chromium in the hopes that the explanation will be something simple.

I think I'm completely outmatched even trying to read this code... but I'm just posting here since this thread seems to have been silent for a long time and I can note down whatever I find here.

I would like to compile this code and step through it, but right now I don't have a computer that will be able to pull this off I think. And I don't want to destroy the machine I have that I use to follow HMH on.

  1. The code for innerHTML is in this file. You can search for setInnerHTML. https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/dom/element.cc

  2. AppendChild is defined in this file. https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/dom/node.cc;l=730?q=append&sq=&ss=chromium%2Fchromium%2Fsrc:third_party%2Fblink%2Frenderer%2Fcore%2Fdom%2F

Here is the code for appendChild im seeing: image.png

which then calls the following function (I think): image.png

which is in this file https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/dom/container_node.cc

So we see there are a bunch of checks going on here.

And this is the code for innerHTML:

image.png which should just call setTextContent.

image.png

And we see here that it eventually just calls the AppendChild function. So I am stumped! To me this seems to indicate that they should run in the same time... so is it possible that actually its the javascript loop that slows the thing down? If that is the case... I can't think of a way to write a test that will do several appends without executing javascript. But I guess if I am right in the assumption that its the javascript that is slowing it down then, just doing it without javascript should make it fast. Maybe NaCl in chrome has a way to check this?

If I could compile chromium, i could maybe hack it to write a function that calls appendchild in a c++ loop and expose it as a separate function on the dom itself. And then I guess it will constitute proof that it is the javascript loop doing the appends that is the issue.

On the issue of why pushing to a javascript array is faster. As far as I know javascript arrays will be created and handled by v8 and dom and its operations will be handled by blink as noted above. Although I have no clue how to figure out if those are two different threads or somehow running in the same thread. But given that the two codes are so vastly separated in the codebase of chrome I suppose it won't be a fair comparison to accept an array push as some kind of metric about how fast it can go. Although I don't really understand the code above, I think this code is completely different from the code that handles arrays, which I have seen in v8's codebase in the past but not screenshotting here. And I don't know how we can just expect them to have the same max speed.

Looking at all these things that go on to make a div and having done web programming for 7 years using frameworks like react and vue though... we are just trying to render stuff. We should be just rendering them on the canvas. However, if we do that our code will still be going through js or wasm. And then we will probably end up being slow anyway. My only hope is the simd in webassembly spec now. I have only yesterday reached episode 112 in HMH which is the beginning of SIMD. And I hope there is some magic in there that will let me write canvas based ui's without using the dom. I pray I don't get stuck with an unsolvable problem... because I really really want to write C and ship it on the web and have it as fast or faster than react or other similar frameworks.


Afterthought: If I was writing a javascript interpreter I would at least write something that can optimize simple loops. And I know that v8 has gone, as Casey Muratori will say, "full banana cakes" with trying to optimize javascript on the fly. So I would not be surprized if we are not even able to write array pushes in a loop that don't end up getting optimized "within an inch of their lives" by v8. And well Im not that smart but the simplest optimization i can think of when looking at a javascript loop like this:

for (let i = 0; i < NUM_ELEMENTS_TO_INSERT; i++) {
   results.push(`Case ${caseNumber}, element ${i}`)
}

would be to simply hoist it into pure C and throw the O2 flag at it. Loops like this should be pretty easy for me to convert to plain C if I have written the whole interpreter... and I'd just do that.


Edited by Gaurav Gautam on

So I am stumped! To me this seems to indicate that they should run in the same time...

Not sure why are you stumped. They run in approximately the same time. If you look at bvisness benchmark for Chrome the appendChild runs in ~7.37msec and innerHTML runs in ~7.48msec, which is almost the same.


Edited by Mārtiņš Možeiko on
Replying to gautam1168 (#27042)

Ah of course! I was thinking about the rant on Jeff and Casey show about creating the website using innerHTML and I totally spaced the fact that it looks equivalent in this benchmark. If there is another stream soon, can someone ask if he was using firefox? And also if maybe we can get some more details about what kind of queries he was running?


Edited by Gaurav Gautam on
Replying to mmozeiko (#27043)

Hey i ran this benchmark on firefox and it seems like firefox is also doing the same with innerHTML and appendChild now.

image.png

I have not seen the code of firefox so I can't say what is happening internally.

I also went and looked at the code for array pushes in chrome's v8. Its in this file: https://github.com/frida/v8/blob/main/src/builtins/builtins-array.cc

image.png

So just by looking at the code and without running it, I think I see that an array push is going to be faster because its doing less things. I will dig deeper into the functions these functions are calling.

I think the problem with the DOM is that when we are just rendering stuff we want to think of DOM elements as much less than what they actually are. Every DOM element has much more heavy duty stuff than is required to just put something on the screen. I'm not saying its all useful, just that there is a lot it has to do. And really its not in the hands of the vendors to optimize the dom api too much, as they are implementing a w3c spec that specifically tells them what to do. The spec for the method to insert a node into the dom is documented here: https://dom.spec.whatwg.org/#concept-node-pre-insert

image.png

Note that this spec probably means that the browser vendors are not really able to optimize the data structure that makes up the dom even if they wanted to. Since it seems to explicitly state what the tree should be like. I guess the issue is that this spec is not meant to be used to write applications in the first place? Wasn't this designed for writing linkable documents? Aren't we just stretching its applicability too much by making it do applications?

Digging a bit deeper the step that asks to ensure the validity of insertion leads to this. Where point 2 looks like it will be bad image.png

I'm guessing that for doing an appendchild one would have to search the entire set of ancestors of the target to find if the domnode being inserted is even valid. Let me see if I can find the implementation of that step in chrome.


Yep! I think I see a loop in the implementation that will depend on both the depth of the node you are appending and the depth at which the target node is where you are appending it. Among the several checks it has to do is the check that prevents loops in the graph

image.png

And the ContainsIncludingHostElements function has the loop: image.png

So, if you try to append B into A then it will ascend up all the way to the root of A and check that B is not anywhere in that path. I don't want to think what this means about siblings because that hurts my brain... But the spec definitely has stuff about siblings too.

So, really I think at this point I can say that the dom operations are WAY more expensive than pushing to an array. There is a loop in there that requires you to check huge trees to do appends. I suppose there would be fast paths in there somewhere which could circumvent these checks when you are doing innerHTML since those can obviously not be in the parent. Which would explain why innerHTML is still a bit faster than appendChilds even though innerHTML calls appendChild at least in chrome.

I suspect this is an efficiency problem not a performance problem (it could still be a performance problem though; I have no idea how to check that as I haven't seen the SIMD episodes yet). The specs for the web seem to make it so that array insert can be somewhat linear at least in an amortized sense. Whereas inserting into a dom tree will only get worse as the tree gets bigger. These are algorithmically different kinds of operations and probably shouldn't be compared.


Edited by Gaurav Gautam on

I imagine Casey Muratori would have kept going much longer beyond this point... and someday I would want to go as far as he goes into things. But, right now I have some crunch time at work (yes yes its a web company, but I'm also not a hardcore programmer, so the two things cancel out). Maybe I'll revisit this in 6 months time and try to go through the whole spec and check exactly why all those restrictions and checks are needed. And why we can't have a more lightweight dom spec that removes the need for having a virtual dom... like a "strict mode" but instead of javascript, this time for the DOM. But really I hope I never have to come back to this. I want to get away from this. I'd rather learn the opengl stuff from HMH. I feel thats more powerful and will make me a much better programmer than learning this stuff will.