Update #7 - Metaprogramming tool redesign

This past month I've been working almost exclusively on improving my tools for metaprogramming. Generating code at compile time has been an important part of Nirion's engine since almost the beginning. It generates the data necessary for reflection at runtime, which is then used to implement serialization of assets and entities as well as property editing in the editor. The way I was going about this before was really hacked together and rushed. The tool would basically go over every file in the project and look for special tokens in the code. These tokens were just macros that expanded to nothing. For example:

TYPE()
struct S
{
    PROPERTY()
    f32 x;
    PROPERTY()
    f32 y;
}

This struct would be recognized as a type that needs reflection data generated for it. The parser would see 'TYPE', read in the parts of the struct definition it needed, and skip the rest. 'PROPERTY' was used to signify that the member should be included in the reflection data, so that some members can be excluded. This also allowed me to specify extra metadata for members:

1 2	PROPERTY(Category="transform") v2 position;

Besides the parser itself being a mess, this worked okay most the time. I came up against some cases where definition orders were an issue. For example the 'Entity' type in Nirion has a game data part that consists of all the game specific entity data:

struct Entity
{
    // Lots of engine data that entities need
    ...

    // Overlapped data. Data that is never shared between entity types.
    union
    {
        Player player;
        Door door;
    }
    // Shared data. Data that any entity type can access.
    struct
    {
        HealthComponent health;
    }
}

To avoid having to update the Entity struct every time I add or remove a new game entity data struct, my solution was to generate this part of the Entity struct using metaprogramming. Because Entity is always defined above game entity code(so the entity type can actually be used), any types defined here could not be embedded in the Entity struct as shown above. So I ended up with this situation:

ENTITY_DATA
(
 struct AnimatorEntity
 {
     u32 animation;
     i32 loopCount;
     
     EntityId attach;
 };
 );

'ENTITY_DATA' would actually expand to nothing, but the metaprogramming tool would copy it's contents and place it in a generated file that gets included before the Entity struct. This means that some types that should be visible to 'AnimatorEntity' were actually not. Ignoring the fact that structs like this now have to be wrapped in macros like this(which is just ugly), it leads to some weird issues that I would rather not have to think about.

This is just one example of the (relatively minor) problems I had with this approach. Overall the biggest deciding factor for me redoing this was just that the code was rushed and pretty bad. I'd constantly run into issues where the tool would just crash because it encountered something unexpected.

Redesign
So after looking at how other people had approached metaprogramming, I decided to ditch using custom markup in the code itself and just parse a completely different format that roughly resembles C++. This was mostly inspired by Ryan Fleury's Data Desk project.

The big advantage to this is being able to add more complex and readable markup than you'll be able to do with macros. Also, the resulting C++ code can be output in an order that's completely unrelated to where it's defined in this custom format. It also makes parsing a little easier, since you don't have to deal with skipping unsupported code.

So I called this new tool CG(just for 'code generation'). All code that needs reflection information or other metaprogramming functionality must be defined in a '.cg' file. The code is then parsed and transformed into an AST. The metaprogram then does some processing with this AST and outputs C++ code. Here is an example struct definition:

@Introspect @EntityDataOverlapped
struct WorldAreaEntity
{
    v2 size;
    i32 priority;
    f32 layerParallax;
    b32 fadeAreasAbove;
    f32 distanceBetweenLayers;
    b32 dontTintLowerLayers;
    b32 isSaveRoom;
    
    @HideInEditor
        i32 dependenciesCount;
    @AC_Mem=dependenciesCount @PersistentRef
        u32 dependencies[8];
    
    @PickableAsset=Asset_SoundQueue
        u32 ambientQueue;
    
    
    b32 clearFog;
    GameAreaType areaType;
};

Initially I started with parsing data definitions and adding some custom functionality that I knew I needed such as 'tags'. Tags just allow you to annotate different parts of the code. Tags are in the form @'key'='value' as can be seen above. A tag doesn't need a value, and a value can be a list of values or other tags by enclosing them in '{}' and separating each entry with a ','.

I then went on to add things like functions, loops, if/else, union, switch, return, etc. It's important to note that this is all just being parsed, transformed by the metaprogram, and then output to C++. So most of what is involved here is just parsing. No part of this 'language' is actually evaluated in any way, so it's probably not as much work as it may sound like. Parsing all these features took me only a day or two.

My initial motivation to go beyond just type definitions was for defining immutable array counters. For example:

@Introspect
struct Tilemap
{
    @NoIntrospect
        b32 invalidatedResource;
    @NoIntrospect
        void* rendererResource;
    @MetaData
        TilemapInfo info;
    
    @AC_Func=Tilemap_GetLayerCount @BulkDataPtr
        TilemapLayer* layers;
    @AC_Func=Tilemap_GetTotalTileCount @BulkDataPtr
        Tile* tiles;
    @AC_Func=Tilemap_GetTileCount @BulkDataPtr
        GlobalTileData* globalLayer;
    @AC_Mem=info.collisionCount @BulkDataPtr
        TilemapCollisionInfo* collisionInfo;
};

The tag 'AC_Mem' can be used to define a memory address that points to the i32 array counter. The editor can use this to add or remove array members automatically(when the user presses the '+' or '-' buttons on an array property). This works well for a lot of things, but to define an array counter based on transient state, such as the tile count in a 2D tilemap, a memory address isn't going to work(unless you're willing to duplicate data). So I decided to parse functions in order to define immutable array counters. The 'AC_Func' tag specifies a function that returns the array count. In this case, 'Tilemap_GetTileCount' looks like this:

proc i32 Tilemap_GetTileCount(Tilemap* tilemap)
{
    return tilemap->info.width * tilemap->info.height;
}

This is mostly just used for serialization. Since serialization is done at runtime, the serialization function can use this functionality to know how many tiles it should serialize without needing to serialize additional data. This was important to me, since one of the big goals with this redesign was to make serialization more automatic and robust.

Eventually I got back to where I was with my previous metaprogramming tool in terms of functionality. After this I started to look for ways to improve what I had.

Templates

While transitioning from the old system to this one, I found that I had a need for C++ template like functionality. Around this point in the game's development I changed how I was designing container data structures. For things like linked lists and hash tables I had designed their interfaces around holding void pointers to make them generic. I found this interface pretty clunky and so decided to redesign these data structures as macros. These macros would take the types as inputs and expand to specific versions of the container.

I like this approach a lot better and needed a way to replicate it in CG so that they could be introspected(among other things). My solution was to recreate macros at the AST level using a 'template' block like so:

@OutputAsMacro=_DEFINE_STATIC_ARRAY(Name, type, size)
template StaticArray(Name, type, size)
{
    @Introspect
        struct $Name
    {
        @HideInEditor
            i32 count;
        @AC_Mem=count
            $type data[$size];
        
        proc $type& operator[](i32 index)
        {
            return data[index];
        }
    };
};

An instance of the template can be generated like this:

1	generate StaticArray(Array_i32, i32, 32)

When a 'generate' command is encountered, all the code in the 'template' block will be copied and any identifier starting with '$' will be replaced with the arguments passed. The template itself is never output to C++, but the '@OutputAsMacro' tag tells the metaprogram that the template should be output as a C macro. I can now use this in CG and C++ in the same way. Tags such as '@Introspect' are also copied when generating, so the template instance can still be introspected.

An example of my dynamic array type:

@OutputAsMacro=_DEFINE_ARRAY(type)
template Array(type)
{
    struct Array_##$type
    {
        i32 count;
        i32 allocatedCount;
        @AC_Mem=allocatedCount
            $type* data;
        
        @NoIntrospect
            Allocator allocator;
    };
    proc void Clear(Array_##$type* list) 
    { 
        if(list->data) 
        { 
            Assert(Allocator_CanFree(list->allocator)); 
            Allocator_Free(list->allocator, list->data); 
        } 
        list->allocatedCount = 0; 
        list->count = 0; 
        list->data = 0; 
    } 
}

As you can see here, I implemented concatenation using '##' just like C macros. As long as the '$' is generated as an identifier, it will concatenate. I find this really useful when writing this type of code. Also 'template' blocks can also include functions. They work in exactly the same way.

After this I figured out I could extend this feature to allow me to generate more complex code directly without needing to rely on the metaprogram as much. For example, in Nirion there is an

1	enum EntityType{...}

which contains an enum entry for every type of entity in the game. This was automatically generated by the meta program based on what entity types exist after parsing and didn't exist in CG directly. I added the concept of "inline templates". If a template uses the 'template_inline' keyword instead of the 'template' keyword, when an instance is generated, the generated code gets inserted at the template definition rather than at the 'generate' command. This means we can do this for EntityType:

enum EntityType
{
    ET_None,
    template_inline EntityType_Entry(entry)
    {
        $entry,
    }
    ET_Count,
};

...

// Somewhere else in the code

generate EntityType_Entry(ET_Player);
generate EntityType_Entry(ET_Door);

This results in EntityType looking like this in C++:

enum EntityType
{
    ET_None,
    ET_Player,
    ET_Door,
    ET_Count,
};

We can also use this to define a entity type name hash to enum type function pretty easily('template_hash' just tells the metaprogram to hash the template parameter):

proc EntityType EntityTypeHashToEnum(u32 hash)
{
    switch(hash)
    {
        template_inline EntityTypeHashToEnum_Entry(case, template_hash value)
        {
            case $case: return $value;
        }
    }
    return ET_None;
}

...

// Somewhere else in the code
generate EntityTypeHashToEnum_Entry(ET_Player, "Player");
generate EntityTypeHashToEnum_Entry(ET_Door, "Door");

We can also use a template to package these together:

template DefineEntity(template_hash hash, type)
{
    generate EntityType_Entry($type);
    generate EntityTypeHashToEnum_Entry($hash, $type);
}

And now entities can be generated in one call with:

1 2	generate DefineEntity("Player", ET_Player); generate DefineEntity("Door", ET_Door);

Without any custom metaprogram code at all. It's also a lot clearer that the enum/function exists and where they are defined. I think this has some drawbacks, though. It can be a bit harder to know what the final result will look like compared to just reading C++ code that directly generates it. Some things are also much easier to do in C++, so I don't do everything like this. I was on the fence about using templates this way, but ultimately decided that it made relatively simple cases like this easier to write and clearer.

Introspection

When a type has the "@Introspect" tag, reflection data is generated for that type. The data structure that represents a type/property at runtime is:

struct CGIntroNode
{
    CGIntroNodeType type;
    
    String name;
    u32 nameHash;
    
    CGIntroNode* tags;
    i32 tagCount;
    
    memindex runtimeSize;
    
    union
    {
        // Property
        struct
        {
            memindex memoryOffset;
            CGValueType valueType;
            CGIntroNode* valueDefinition;
            
            i32 staticArrayMax;
            CGIntroReferenceType referenceType;
            
            CGArrayCounter arrayCounter;
            
            b32 hasNoSerializeTag;
            b32 hasBlockSerializeTag;
            b32 noCopyFromPreviousVersion;
            
            i32 enumValue;
            
            b32 isIndirect;
            i32 category;
        };
        
        // Type
        struct
        {
            CGIntroDefType defType;
            
            CGIntroNode* properties;
            i32 propertyCount;
            
            i32 typeId;
            
            CGIntroNode* previousVersion;
            UpdateDataFunction* updateDataFunction;
            u32 version;
        };
        
        // Tag
        struct
        {
            CGValue tagValue;
        };
    };
};

The information for a 'CGIntroNode' is generated as global variables by the metaprogram. An example:

// EyeTurretType
CGIntroNode EyeTurretType_Tags[] = {TagNode("Introspect", (u32)951444544), };
CGIntroNode EyeTurretType_CGProperties[] = {
EnumEntryNode("EyeTurret_RadialFire", (u32)-997209664, 0, 0, EyeTurret_RadialFire, false, 37), 
EnumEntryNode("EyeTurret_StraightFire", (u32)1971805262, 0, 0, EyeTurret_StraightFire, false, 37), 
};
CGIntroNode EyeTurretType_CGType = TypeNode("EyeTurretType", (u32)972719426, EyeTurretType_CGProperties, ArrayCount(EyeTurretType_CGProperties), sizeof(EyeTurretType), EyeTurretType_Tags, ArrayCount(EyeTurretType_Tags), CGIntroDef_Enum, CGTypeID_EyeTurretType, 0, 0, 0);

As I mentioned before, serialization is done at runtime. The "CGIntroNode" type is used to traverse the data structure recursively. A version number and previousVersion is stored on types. If a version number does not match, a conversion function(defined in CG) is called that converts from the stored version to the next version, all the way up to the current version.

Conclusion

I think that covers most of what I worked on in at least some detail. Overall I think this system is working a lot better and is much more robust. It took a very long time to change everything, but it seems like it was worth it. I still have a lot more improvements to do on the engine like this, so I'm not sure when I'll be back to developing the game's content.

Thanks for reading, let me know if you have any questions!

Update #7 - Metaprogramming tool redesign

Comments