Ideas on tree memory allocation

Jayme

#13269

October 15, 2017

Hello, I'm new to this forum, so let me know if there's a better place for this post.

I'm working on writing a compiler from scratch, following (to some extent) the course at https://lagunita.stanford.edu/cou...gineering/Compilers/Fall2014/info.
I've gotten to the part where I'm constructing an abstract syntax tree and I'm having dificulty figuring out how to construct the data structure in a HMH way. Each node of the tree is a Token that can have any number of Children nodes.

I've decided on one of 2 options for the child pointer:
1. A pointer to an array of Token pointers
2. A pointer to a linked list of Tokens

My question is how to allocate memory for these scenarios:
1. In this case the only solution I can think of is to call realloc() every time a node gets added (or to do it less fequently by mainting a capacity and doubling when needed.) The problem is most nodes will only have 1 or 2 children, but some can have 100s or more. Perhaps the most HMH way to do this would be to write my own version of realloc?

2. This way seems pretty straight forward to allocate memory for: An array of Tokens and pointers address the children and siblings. I didn't particularly like this option because seeking through the linked list might not be ideal performance-wise.

What are the benefits or drawbacks of these ideas? Is there other ways that I haven't thought of?

And if there's a HMH episode where he does something similar, or any other resources, post a link (I'm still not very far in the series).

ratchetfreak

#13270

October 15, 2017

Profile and see what the common use cases are. Then you can target the allocation sizes accordingly.

Starting with 16 and doubling every time you exceed the capacity is a common general strategy.

Of course if your most common case is 1 or 2 and then it jumps to large arrays immediately after then start with 2 and go straight to 16 once a third gets added.

Also consider some kind of short buffer optimization. Where you have a `Node* shortBuffer[2];` inside Node to store just the first 2 children and then a Node** array to store the array of pointers for the rest once you need it.

Dumitru Frunza

#13278

October 16, 2017

I prefer the second option - a linked list of Tokens:
1) It's the simpler solution.
2) Easy to implement, which frees up more time to think about the main problem, i.e. the building of the AST.
3) If the performance turns out to be a problem, then the "pre-allocated children per Token" solution could be tried. Or another solution alltogether. At this point, the AST should be figured out, so the usage code has been written, and that can be used as a guide to the optimal solution.

I'm also writing a compiler and was faced with the same dilemma. I've opted for the linked-list solution, for the reasons above.

Jayme

#13292

October 19, 2017

Thanks for your replies!
I've decided to implement the Array/pointer method and use realloc() for now. I may go back and try to implement my own allocator later.
I found this article while looking for ideas: http://g.oswego.edu/dl/html/malloc.html.
It's an interesting read so far.