Hi all,
This is my first personal project related post - I hope to join in and contribute where I can to others projects also :)
I'm working on a handmade voxel editor (in C/C++ and OpenGL) and would appreciate some exchange of ideas and thoughts around memory management.
The Octree Data Structure
The voxels are represented as leaf-nodes in an octree data structure. Currently the octree is a contiguous block of pre-allocated memory, so there are no pointers to child nodes etc (this is in order to avoid cache misses due to jumps etc). Instead, nodes are accessed based on their x,y,z position and depth, which gives an index/address into the octree memory. This is possible because I know up front how many nodes and what depth I need.
Each voxel has a colour-index per face. Currently this is 6 bit per face, since I want to use a limited colour-indexed palette of a potential 63 colours - a colour-index of 0x00 is reserved to represent an empty voxel.
This means each leaf node takes 36 bit total to represent 6 faces with colour indexing. This encoding obviously doesn't really fit fundamental data types well, So I'm using 64-bit to store it which leaves 28-bits of padding. You can imagine a voxel as:
Question 1. Is there anything I could do to improve this representation? Alignment etc.? Maybe it'd be better to use just a ?
Triangulation & Rendering
The way rendering works is that given a list of nodes, the voxels are iterated over and triangulated. This list of output triangles is optimized. i.e. co-planar triangles are merged/removed and nodes which are entirely enclosed can be completely ignored etc. Batches of these triangles are uploaded as VBOs (Vertex Buffer Objects) to the GPU via OpenGL (or other graphics API) ready for rendering.
Unfortunately, before iterating through these voxels and building the triangle lists I have no idea how many triangles I will need, so it feels like I'll need to dynamically allocate memory. This sounds like a bad idea to me from a performance and memory fragmentation perspective.
My first thought is that perhaps I can make my own custom memory allocator - pre-allocating a large block of contiguous memory as soon as the program loads and treating it like a stack which I push/pop from.
Question 2. Does the idea of a custom memory allocator based on a stack access pattern sound reasonable? Are there better approaches I could consider?
Question 3. Should I maintain a separate stack for different data types? This would presumably make iterating over these data-types faster due to memory access patterns being contiguous?
Phew. Many thanks for getting this far, I hope to share my work once it is slightly further along!
Any feedback and discussion would be most appreciated!