Compile-time introspection and metaprogramming | Handmade Network

Compile-time introspection and metaprogramming

Thanks to new languages like Zig and Jai, compile-time execution and metaprogramming are a popular topic of discussion in the community. This fishbowl explores metaprogramming in more detail, and discusses to what extent it is actually necessary, or just a waste of time.

07:31

what's good fishbowl peeps

07:34

Topic: Compile-Time Introspection and Metaprogramming

I have one comment to start things off:

duck typing is horrible

for compile-time introspection?

yeah

I think that's probably more of a tools problem than an intrinsic one

no it's related to templates and all that jazz

for example, with an intelligent editor you could replace all instances of the duck type parameter with the instanced type

07:36

when you're introspecting

07:36

(and ofc compile-time introspection is not limited to just debugging)

07:37

so I think this is more of a where-we're-at-right-now problem that's solvable with a better toolchain, though I agree that for what we have available at the moment to work with it's terrible

but with ducktyping I mean that template params are checked by the api they expose matching some pattern

07:38

which results in horrible error messages when hit somewhere down the instantiation stack and kicked upstream

I see what you mean, yeah

07:39

so you don't realize that you used an invalid type until something 32 calls down the stack tries to subscript it or whatever

Odin tries to address this problem with type constraints on procedure declarations

07:41

so for instance

foo :: proc($T: typeid) -> bool where intrinsics.is_type_int(T) { ... }

07:41

which would error right on the definition if you passed say a float

Hello

07:46

Duck typing is bad because they have wings and not fingers.

07:46

But in all seriousness...

one of the solutions I've seen in the while regarding that is traits that let implementers add in fail-early checks

07:47

but that is only half the solution because the template can use things not specified in the check

07:47

so there is no way to know whether you covered all your bases

In general, you are complaining about the problems of structural typing. The two broad categories are nominal and structural.

there are two pretty decent solutions as far as I can tell, one of which @gingerBill implements in Odin and could elaborate on (early-out type constraints)

07:50

the other solution would be static analysis

07:50

which, of course, has its associated costs

07:51

but it would allow you to e.g. recursively gather all possible usage types of the ducktyped variable, and use that at the instantiation site to evaluate whether or not the usage is valid

or define a set of traits that the user can fill in which the compiler in turn can substitute in the template

but that will very quickly explode in complexity

I'm not sure how different traits would be from type constraints in this capacity, could you elaborate?

Odin's solution is where clauses. It's not unique but exists in many languages in different forms. It's kind of like a contract between the caller and the polymorphic signature.

But in general, I think highly generic things with these forms of constraints are only useful for certain things like mathematical procedures.

07:53

Usually you want a specific kind of type

@gingerBill that's what I meant with early fail

07:54

with where-clauses you fail early enough that another overload can still be picked (this is what sfinae actually is)

Correct. It also allows for two overloads with the same signature but with different clauses

07:55

If you have overloads that is

which if you have templates you will need on some level

In C++, template specialization is usually this method.

07:58

The question I usually ask myself is what times do I truly need something this generic?

simple data structure and mathy types

Data structures like an array, hash table, etc, sure.

07:58

Exactly

07:58

Everything else is pretty much very not generic

the other ways I have seen templates used is optimizing out a function pointer

This is the problem of being overly generic a lot of the time.

08:01

Which leads lovely into the main topic.

last I looked into it llvm had a lot of templates

08:03

most parameters got most of the time only 1 specific type as input

It's a common affliction people have. I have done it before. Trying to be more generic thinking about possible problems I might have rather than problems I actually have

Meta-programming also allows you to chase down a "generic" solution, which is why it's both powerful and a possibly dangerous tool to get comfortable with, I think.

being flexible is important when exploring a solution, but you can do that without adding compile-time metaprogramming into it

Well whether you can or not depends on the features you've already got in the language, and the set of hard constraints you're trying to explore.

09:14

If your language doesn't give you the modeling tool to iterate quickly on the structure of your solution without - for instance - prohibitively expensive extra run time work, then metaprogramming starts to look like a more attractive way to get flexibility. (edited)

I agree, Allen. I have found that the less I use metaprogramming, the better of a solution I produce. Metaprogramming is playing with fire. Sometimes I just need scissors.

leave the footguns at home

And to make my thinking clearer too, I usually separate out the categories of "metaprogramming": Runtime Type Information Parametric Polymorphism (Generics, Templates, etc) Compile Time Execution Code Introspection Code Generation

I think the interesting part of the topic is trying to come up with a useful method or criteria for when to make the switch to doing something "meta".

And that's the thing I have been trying to figure out myself lately.

I don't have an extremely well defined line for this myself and have crossed it and been burned on plenty of occasions, but other times it has absolutely paid off.

I do prefer not resorting to anything like code introspection or code generation when I can.

09:24

Metaprogramming still feels "wrong" to me a lot of the time, and 80% of the time when people think they need it, they don't.

09:25

When I was purely in C, my metaprogramming tools where to fix/add the features lacking in the language rather than solve other problems per se.

I'm not sure I'm quite as put off by it - but I will say that I have found over time my preference is to generate tables rather than actual executable. For instance instead of printing 20 "log entity" cases, print 20 "entity metadata" tables and write a single log entity loop that is driven by the table.

09:27

In a way it's more like making up for lack of runtime introspection in C in a lot of places. (edited)

Yeah. That's usually my main case for code generation: producing lookup tables.

What features are worthwhile to support data generation? 90% of my needs are covered by using the X-Macros trick in C. It's not nice but I don't know any other tool that would be a better tradeoff (maybe Lisp?)

09:36

The other approach I use in rare situations is just making a Python program. Though that requires introspection into (or generation of) the data structures used in the program

There are two forms of code generations: Data types Lookup tables The former is usually solved with something like: having more fundamental data structures built into the language itself or parametric polymorphism in the language, or just trying to reuse things The latter is sadly not always doable with X-Macros, especially when you need to generate lookup tables from other source data.

What are concrete examples you're thinking of where X-Macros are not sufficient for data generation / lookup tables?

09:43

Is it for example preparing a hashmap or something of that ilk? Or did you think of something else?

09:44

I believe a very powerful approach to data generation is to prepare the final representation/indexes at runtime

The thing is, if you are designing your own language, you don't need to use X-Macros as a crutch solve the lack of decent data types in a language and the lack of RTTI.

09:47

So all my main cases for X-Macros in C disappear in a saner language

So I think doing data generation at runtime does sense in some cases, but there are absolutely cases of data I want packaged into my program that I wouldn't want to initially represent as X-Macros in the first place. So even if I was going to say let's build my hashmaps at runtime from the X-Macros, I would still have precompilation data generation to do.

09:55

For instance serializing a set of structs is one case, because if I did them as X-Macros I'd have to define my data in a very alien looking way - maybe this is just an example of taste I suppose.

09:56

Another example is any data type that forms a tree or graph which doesn't come up as often but when it has I am not sure I would even know how to represent it as X-Macros without a lot of duplication of the coupling of things. (edited)

That's another good point you bring up. The difference between compile time and run time, and the costs of both of them. Sometimes I just don't mind generating the lookup tables at runtime because it's easier to write, easier to manage, and doesn't require a more complex build system. Sometimes I really need to do it at compile time, but usually only use and never every time. And this is the complaint I have about languages with compile time execution, it does make people recompute a lot of things every time you rebuild. There is not hard rule for this that I can think of, just general trends. From a language designer standpoint, I don't want to enable/encourage silly things but I also don't want to prevent people from getting the job done, but I don't want to enable/encourage a certain style by default either.

This is something we're looking at in Dion as we are thinking about the scope of how code is written expanded outside of just the compiler. I have been thinking for a while that the fairly fixed ideas of "compile-time" and "run-time" are not super helpful. I mean things like AAA game engines already embrace this by having offline asset baking processes that don't run every build for example. It does raise the question of when are you no longer talking about metaprogramming and just talking about data processing, but I feel like these things aren't that distinct. I think they form some kind of a spectrum and we've ended up with a fairly course grained look at that spectrum by the fact that all our systems really formalize are "pre-compilation-time" (i.e. un-formalized, you manage the whole thing and make it up yourself) "compile-time" and "run-time".

10:15

And even with just "compile-time" and "run-time" a language like C does a very poor job of letting you glide your computations around that spectrum freely.

10:16

For instance you'd probably want to debug compile time code in a more sane system.

JIT has been another factor to blur the line between compilaiton time and runtime

10:17

ugh that's another thing that annoys me with compile time, no very hard debugging

This is one reason why I like the option of having literally another C executable that can introspect on an AST and generate information in a straightforward way (via fprintf or whatever), and output tables or code. This allows you to debug things a little bit easier. This isn't the best option ever I suspect but it seems like a good idea while we're stuck with C at least.

10:20

That's definitely preferable, debuggability-wise, to something that's opaquely implemented in the compiler, like C++ templates or what have you.

10:23

Also the distinction between compile-time and run-time you guys brought up is super interesting. I really do think the distinction is purely a result of the structure of the tools we're used to, and it's made harder to break out of because of the fact that to do any kind of code analysis you need a parse of your code which, even if not super difficult (in the case of C), it's a nontrivial problem, or in the case of C++, it's a complete nightmare. Maybe people have made good use of Clang here?

clang is a huge mess

10:41

there have been a couple attempts to make Odin bindings generators for C with it

10:41

it does not seem like it was ever geared toward enabling new tooling to be built around a common denominator

10:42

unless anyone has anything to add re: clang, now that ryan is here, maybe we should move on to future-forward ideas around the topic since we've identified several problems that exist in the current toolchains

10:43

I'll start off with a basic question, what are some metaprogramming or introspection features that people want, but don't exist in any reasonable form in current mainstream languages? (edited)

I think I can summarize a lot of what I want by just being able to treat code as data, in the form of the semantic structure of the code (as the abstract syntax tree that the compiler backend would see). It should come to no surprise that I think this, though... I also think compilers ignore the idea of "tuning the knob of unstructuredness". In text languages, this is sort of accomplished by having arbitrary text (completely unstructured), and rigid language constructs (very structured). When really, you want the ability to go anywhere inbetween also, and put that data anywhere.

10:47

Data being less structured is in fact what's useful about X-Macros, because it's a much looser structuring than a struct for example.

10:47

But it's still some structure.

applying your transformations at the right level of structure changes a lot

1

10:48

like you examples of adding parenthesis to an expression

10:48

first flatten out to tokens, then add the parens, then reparse

10:48

that's much easier than doing a AST shuffle

That's an extremely interesting connection, I hadn't really thought of it that way but you're so right. Yeah this is a great example—it was more convenient for us to write that code in terms of manipulating what we call a "linearized render"

10:49

It isn't quite the same as tokens but it's similar in principle

10:50

The "linearized render" is a buffer of "symbols". Some of those symbols have no direct equivalent in the AST—for example, a paren or a selection marker—but they can also be AST nodes (which can be full sub-trees or a leaf). (edited)

10:50

And then we use a forgiving parse to reconstruct the AST.

10:50

(The parse is also dramatically simplified when compared to arbitrary text)

but then when we look at current meta programming features they are usually stuck on a single level of structure

10:53

which means that the things you want to do end up being awkward to do

Right. In Dion we've been playing with the idea of "unstructured sets" for this exact reason.

10:54

Basically giving a path to do graph-like structure, without also implying some kind of operation or semantic information. (edited)

10:54

(And of course we have strings in nodes so you can also have arbitrary text, but the usefulness of that goes down when you have things like unstructured sets)

So another point about this level of structure thing is the idea of code tags. I first saw this idea in JAI and then got to play with it in DataDesk, and what I started to realize is that what they really offer are key-value pairs of information. @ParticularTagIsAKey(Parameters, Are, The, Value). This isn't as unstructured as say a comment, but it's a very loose form of structure in the sense that I have a really hard time thinking of anything you couldn't sort of represent with it, if you're willing to accept a lot of slop.

10:57

Once you can encode data at that level of structuredness into your code, if you can then introspect and do work with it, there are a whole lot of new things you can do.

10:58

I don't know if this is something that is present or not in mainstream languages - but I definitely miss it in C.

Yeah. I guess another way of saying this is C gives us a very coarse view of structuredness, and we really want something more granular.

re code tags: They've been an important part of some of the metaprogramming stuff in lua that I've been trying out. I've set up a path to generate a C enum definition from:

enum("Grid_Rotation", "u8", enum_val(0, "NONE"), enum_val(1, "R90"), enum_val(2, "R180"), enum_val(3, "R270") ),

With code tags, I can add new nodes to the AST after the enum values have been defined such that it will emit the enum as well as an aux count variable and an array of strings:

enum("Grid_Rotation", "u8", create_struct(), enum_has_pretty_names(), enum_has_count(), enum_val(0, "NONE", pretty_name("None")), enum_val(1, "R90", pretty_name("90")), enum_val(2, "R180", pretty_name("180")), enum_val(3, "R270", pretty_name("270")) ),

(edited)

11:29

The heavier workload that I'm using this for is a versioned entity system which generates a lot of boilerplate I/O and version upgrading code from a statement in the form of:

struct("Exit", version(1), is_entity(), serializable(), using_entity_base(), -- bunch of fields field("u8", "exit_flags"), to_next_version([[ if(from->exit_flags & (1 << 0)) { to->base.flags |= OBJECT_IS_UNATTACHED; } ]]) ),

@Justas I love this. I want to see more!

Well another interesting feature of this is that you can create functions which return new definitions. One of the things I use this for is a very simple component system: Entities that have visual interpolation will call the following function in their body:

local function component_visual_interpolation() return field("Visual_Interpolation", "visual_interpolation", is_component(), dont_serialize() ) end

The is_component marker is then used in a late ~~AST~~ tree pass which generates giant switch statement functions in the form of:

static force_inline Visual_Interpolation * try_get_visual_interpolation( Ent_Base * ent ) { switch(ent->id.type) { case Ent_Type::ground: { return &(reinterpret_cast<Ent_Ground_3*>(ent))->visual_interpolation; } case Ent_Type::grid_bucket: { return 0; } etc ... } }

(edited)

12:15

At the core of this all is just a duck typed ~~AST~~ tree which lets me add nodes with arbitrary data and then operate on them. (edited)

12:15

Past this it's mostly lua code for performing the rewriting and emitting the final code (definitions, editor integration, i/o etc) (edited)

java's annotations are essentially like that

11:00

but then that's java

a fatal flaw

11:00

java's biggest mistake: being java

also the way I've seen them used sometimes I just wonder, why they didn't just use an interface to expose those annotated functions instead

11:03

the one place where I agree with their use is when I was making an xml parser where I could link each member with which attribute/child element they matched up to

11:03

that kind of introspection you cannot do with just an interface

This ties into some earlier comments made about genericism—the most generic things are less structured. One problem with many generic systems is that while they specify less structured data, or data with missing information, the missing information is filled in by predefined paths, whereas there's a lot of power in specifying some high level pieces of information, then being able to specify how those get mapped to more specific things

11:23

I think it's useful in constructing sets of high-level data that you specify and tweak over-time, without having to do a lot of the plumbing work—it makes it easier to design things as a programmer

11:24

X-Macros are the simplest example of this, I think: Here's a list of associated data, I'll #define the macro and generate this list to generate some code, and then in the future all I need to do is change this list (edited)

Genericism a good term for it. Many people suffer from it; I bet there's a pill which can help people suffering from it.

1

I think genericism can be very useful, but it depends on what you're being generic about

If you're being generic about generic stuff, sure.

11:27

But generic things rarely exist

11:27

And generalities are usually "truths"

I think it really depends on the thing you're trying to generalize for

11:28

I think it's very important to acknowledge problem constraints and the reality of your problem, though I've been finding recently, for example, that programming things in terms of more generic "lego bricks" that may be able to represent a wide variety of data makes the "plumbing" of a system much easier

11:28

There are things that are just not useful to try to generalize for

11:29

e.g. "What if my machine had X bits per byte instead of Y?!" (edited)

11:29

But I think saying something like "I just want to consider this kind of data generically, even though it might be in different forms, and maybe I'm being a little bit wasteful but that's fine, because it makes all of my code uniform" can be a very great choice (edited)

So I think a big problem here in this conversation is going to be that to say that something is generic isn't exactly a well defined thing.

This is a very old problem

Is a C function generic? What about a function in a language with parametric types? To say you're making something generic doesn't really tell us at all where we are.

And well known to philosophers

C functions generalize over architectures. Parametric functions generalize over a set of similarly structured functions in the same language with a particular rule about how those functions are allowed to different from each other etc.

11:37

To say generic things don't exist isn't really helpful until we point out what things we're talking about generalizing over.

Yeah I don't know if this is a technically correct conception, but I think of "making something generic" as choosing a variable to let vary.

This is why I'd put forward that the language of "compression" works better than "genericism". If you look at your problem space, identify a bunch of repeated patterns and then have a way to compress them, hopefully losslessly that's about roughly what we mean by being generic.

generic/specific distinction is an old problem and navigating between the two is hard

Going towards one means you lose something

The reason I like this language is it forces you to be more literal about what you're compressing, and if you're really getting good compression with reasonable lossiness.

Whereas just talking about "abstractions" or "genericness" doesn't really come with those intuitions built in.

11:42

Language features just define a compression algorithm over software. The programs that will tend to come from that language are the programs that compress well with it's features. Metaprogramming then is what happens when the predictions the language made about what is likely in a program don't quite match your entire scope, and so you have to define your own compression. (edited)

I think one interesting property of genericism/compression is that it, by virtue of being generic/compressed, has many possible paths—the most generic thing is just a parent to many ideas, so you have a tree of "genericism" formed, with the leaves being the least generic thing, and the root being the most generic thing.

11:42

Also yeah this is a good point. I've been learning to like the "compression" language quite a bit more.

Or just not compress - which is often the right call - but now there's no confusion about what your costs and benefits are to each option (in my mind at least).

I think that language does help a lot. Because you're not just making something more generic, you are picking something to compress about something.

11:47

The term "generic" is a... well, generic term that includes something like compression.

Related: https://plato.stanford.edu/entries/generics/ (Warning

️ , this site is a rabbit hole)

gulp... clicks

I guess the more a language gives me, the less I need something like metaprogramming. I'm not sure how to define it.

if you get the batteries there is no need to build a generator (edited)

Yes. That.

13:32

I don't want to build the battery from its raw components.

13:38

Even if I enjoy that process, it's not practical

There's value in keeping the core reality of a tool small and simple, though, right?

14:05

A language surely shouldn't expose all possible models you might want to specify data in.

14:05

But should those models be excluded?

no because sometimes those models or a new model that gets invented are useful

Right, so I guess in that sense I think metaprogramming is necessary

14:08

You always want a backdoor that allows someone to get around a limitation of the models that the core language provides.

Simplicity is complicated (edited)

well, this is where I think non-textual code representations really have an advantage

14:22

to shill dion for a moment

14:23

because textual languages can only have one representation

14:23

and it's best for that representation to be consistent and thorough with a specific set of design principles which are prioritized above others

14:24

e.g. statically typed systems programming, versus garbage-collected dynamic functional scientific computing

14:24

if you decouple the primary representation from the final frontend, and you keep that representation simple but flexible, you can have multiple frames of reference within which to operate

14:25

e.g. dion could have (and will by all indications) a statically typed, low-level systems programming frontend, and also the dynamically typed and garbage collected functional frontend

14:26

which could both be complete, tightly designed packages with a reasonable set of constraints for the problem space

14:27

but they would be interoperable, and there would still be a "metaspace" outside of those frontend representations in which you could perform manipulations over them that defy their constraints, and even define your own frontend with its own well-thought-out constraints

14:28

and while I'm shilling dion here because everyone here is familiar with it, and I'm excited for the project, this applies to any future-forward toolchain that's designed with this consideration in mind

14:29

jai also exposes something like this to a much lesser degree

14:29

where Jon has shown for example the ability to write a compile-time MISRA compliance verifier in native Jai code

14:30

and also the ability to write a compile-time lisp parser in native Jai code that outputs the resulting lisp program through the Jai backend

14:32

(of course, to be clear here, because there has been quite a bit of confusion over "non-textual" as it relates to dion, the key difference is that the simple AST representation is the primary code representation in dion rather than text, which means it's very easy to construct custom toolchains around with first-class support) (edited)

I'd like to back up to something @ryanfleury was saying and point out that you don't need metaprogramming even in C, then swing back to this question of mixing models.

14:59

Any programming language that is complete, basically in the "Turing Complete" sense can get you to a piece of software that computes and computable function.

15:00

That's not the same as saying that all possible programs are covered by the language. There are many more programs that could exist that C will never generate, than vice versa, and the same is true for any language that doesn't have an escape hatch for going back to the lower level.

the main real problem that metaprogramming solves imo is where your turing complete program, in the language-native representation, would take an unreasonably massive amount of code and in particular repetitive boilerplate to accomplish (edited)

The whole point is that if you bring in the constraint of maintenance, the fixed budget of what you'll be able to deal with over time as you try to polish and include additional features, you start having to ask yourself how you're going to fit more in.

the turing tarpit problem

right

15:02

another thing that macros frequently solve is where you want to define multiple things with one single consistent interface

15:02

e.g. a struct and an associated union tag

15:03

things that, with enough code, could become absolute maintenance or revision nightmares if you didn't use metaprogramming techniques

So if you look at the program you're trying to make and see that the modeling tools of language X alright fit your target really well you're in good shape.

15:07

You can use language X, and it's probably never going to generate the program you would have written by hand with an assembler, but you're willing to take that lossiness because writing the compressed thing and having the compiler decompress it means you can afford to maintain the program.

15:08

The issue that I tend to find, is that you never know exactly what models would fit well ahead of time when you're doing any sort of experimental or research type work.

15:09

Instead the language selection process is a lot more about if you happen to think - for probably no reason better than experience or observation - that a particular language has "really good" models that will surely be good enough for anything you might want.

15:11

Ultimately though, I think we're sort of stuck where this is never actually true. Some days the problem before you would be well suited to a model like a C/Odin/JAI/Rust whatever, but other times the task in front of you has a lot more in common with something else.

and sometimes you have a project that needs multiple things each needing a different model

From there I guess you have three choices: 1. Accept the current range each language is capable of covering and live with that limitation - perhaps that's the way that's best 2. Try to make new languages that cover wider ranges 3. Try to go meta so that users can adapt the modeling tools more specifically to their problems

At least those are the three I can see.

scripting languages is a common solution for that

yeah @demetrispanos has elaborated at times on his plastics vs metals metaphor

where programming in python for rapid prototyping, sketching out high-level design, etc is the flexible and cheap "plastic manufacturing" process

Different problems require different tools to solve them.

15:13

I am actually really happy with all the new languages coming about.

and then lowering performance- or robustness-critical sections of code to C is the rigid but durable metal process

15:14

yes but I think the problem that Allen is addressing is that the specific strengths and weaknesses of a given tool rarely fit the entire problem domain

Of the C/Odin/JAI/Rust list alone, I wouldn't use them all for the same things, even if they are of the imperative category.

they're usually ideal for certain pieces of your problem domain, and you choose the tool that covers the majority or largest minority of your problem domain

There is no silver bullet, even for a specific domain. This is nature of the problem, the intersection between humans, hardware, and the problem at hand.

whereas if you could use multiple tools (programming languages in this case) as part of the same process without any friction, as well as being able to step outside of them and define meta-abstractions, you'd be in a much more advantageous position

Right. Usually if I split my problem down enough I can find a good tool for each problem, but we don't have a story that let's my one single actual problem be served by multiple tools at the same time.

15:16

Err I mean "I can find a good tool for each sub-problem"

15:17

Although sometimes I also feel I can't "find" one actually, and still have to make one. (edited)

yes, and the lack of a simple, robust and flexible common representation for all of these tools to target is a big problem

15:18

llvm is the closest we have, and it's not even close to meeting any of those criteria

15:18

and it's also far too low-level for what I'm talking about

Although sometimes I also feel I can't "find" one actually, and still have to make one.

And that's because we are still in the discovery phase of this discipline. No huge "truths" have been discovered and practiced for a long time yet. No traditions carrying wisdom. We are still having to discover the wheel ourselves all the time.

15:20

And this is the case for things like metaprogramming.

15:20

The concept is old.

15:20

But it's still one of those things which should be wielded carefully.

The fact that tools are usually pre-made for you (for practical reasons) means also that you don't have a lot of control over them. They're more black than white boxes. Which leads to a natural insulation - like in the previous example where binding a C library to Odin is not a smooth experience (edited)

15:22

Maybe it's worthwhile to identify little boxes that can be black

15:23

Not everybody is going to write a new compiler for each project

like in the previous example where binding a C library to Odin is not a smooth experience

It's smoother than most other languages. But it's not as if it was programming in C or it automatically generated the bindings from the source code.

15:23

And that's not going to be solved easily.

15:23

C and Odin are not actually that similar in terms of their type systems.

Yes - not meant as a criticism (I haven't tried myself)

I know.

15:26

There are many problems, and many not "ideal". There is the problem when using multiple languages (including meta-languages) to solve a collection of problems, since there is a high chance they will need to communicate with one another. And that communication can be very difficult. (edited)

So another reason I like to look at these things as compression is that it fits my intuition that there is never going to be a "huge truth" that covers all the problems you could ever end up researching. Yes the things we're researching today might be boiled down into a much better model, but the research will always be happening at the edges of what we know how to do.

Even if there was a "huge truth", you will never reach the top of the hill.

Instead of trying to make a compression scheme for code (a language) that is fundamentally great enough to handle the future, I see it as more practical to embrace the fact that new models need to be easy to iterate on, hence why I don't shy away from metaprogramming.

15:31

If we could rapidly build programming language extensions, or whole new programming languages (and solved the language mixing problem), then I might not see it as all that interesting to do "metaprogramming" as we do it today.

For me, the way to "experiment in metaprogramming" already exists, but is rarely done. You effectively just need to have a compiler for the language in your core library of that language. Coupled with the added benefit of RTTI in a language, this is the "goldmine".

with some tweaking you can do that with java, it's classloading only looks for a binary blob and doesn't really care where it comes from

If a compiler was written as if it was to be used as a library, then we could do this really easily ANYWHERE. We'd only need the frontend (maybe up to the SSA bit if you need to do a bit more analysis)

15:37

But if you need simple introspection on the typed AST and other checker info tables, that is already damn useful.

I'm not particularly convinced why RTTI is needed? I'm more like, we should have this RTI thing :-). I've tried for some time but I ended up thinking that types are not a good tool to "tag" things and do computation on them. Because using many types leads to incompatibilities and a lot of wrapping and unwrapping

RTTI is the bootstrapping stage.

I think it would be great to be able to produce that data (the RTI that I mean) by being able to treat types as values at compile time

I don't want to have to run a program, to generate the RTTI in order to run the metaprogrammer in order to compile the final program (edited)

15:39

RTTI has a lot of uses!!!!

15:41

Most of the cases I have seen many people use their metaprogramming tools for are to generate lookup tables containing type information for that program.

15:41

If that's part of the language already, it doesn't need to be a separate stage.

15:42

And I mean, I want to use RTTI in the metaprogrammer itself, not just the final program

Types aren't the only thing worth Introspecting on however.

But I'd argue you don't need to introspect more in your metaprogrammer itself.

I wouldn't agree with that. In fact, many editor features are literally introspection on much more than just types.

The editor IS the metaprogrammer, which is my point.

15:44

The metaprogrammer in my view is a compiler effectively, which you can modify the behaviour of.

15:44

So it can introspect the target code it is working on.

15:45

But the metaprogrammer itself probably does not need any more introspect beyond the RTTI of its internal types.

15:46

Is that clearer?

You mean to say that no metaprogram ever would need to introspect on more than its types?

15:47

(Probably)

15:47

Because I don't think I agree with that either, really.

I am to say that you don't need a metaprogrammer to introspect your metaprogrammer.

15:47

Or at least I have not yet seen a case for this. (edited)

Why shouldn't the programmer make just the data that he needs and use that at runtime? Why must there be this RTTI system in the compiler/language which has a very fixed view what types are? There's literally a type called TypeInfo in many languages. And it might be a bad idea to depend on this type in the source code - this type is unlikely to contain the actually needed information, and it's not unlikely to change, and such changes are out of control for the user

I guess my point was to say that editors are the "original" metaprogram. And they have to worry about a lot more than types. (edited)

15:48

And editors themselves are applications and programs, and as such face the same problems that other programs do, thus requiring introspection in the same ways. (edited)

And I am not disagreeing with your point, Ryan. Editors need to introspect a lot more than just types.

Hmm okay I might be misunderstanding then.

15:49

Honestly metaprogrammer might be throwing me off because I've always used the term metaprogram :P

It probably is.

15:50

It might be a "Billism" because the terminology doesn't exist yet.

15:50

A "metaprogrammer" is a program which analyses another program and generates new code for it.

Gotcha. Right okay.

15:50

So, I think you know how Data Desk works, you're effectively saying that you wouldn't need introspection on a Data Desk custom layer?

In this case, the editor of Dion is a "metaprogrammer"

15:51

It introspects the language directly. It's a form of compiler.

15:51

So a "metaprogrammer" in eyes is effectively a partial compiler.

I don't know—I mean a "metaprogrammer" is just a program.

It is!

And as such, can require introspection as well.

And that communication can be very difficult.

this is exactly the case for having a common, well-defined interop layer

But do you need a metaprogrammer for your metaprogrammer, dawg?

2

15:52

That's a serious question too

To put it simply yes, I think I do.

And my answer is, probably not.

Yeah I would agree with Allen. Compilers or partial compilers are very complicated programs and as such probably require the same things other complicated programs do.

15:53

Like I don't see a principled difference between compilers and other complicated programs in this respect, is what I'm saying.

But once the compiler is created (maybe with the aid of a metaprogrammer), do you still need a metaprogrammer to use it? (edited)

The reason I want meta-on-meta is that I want to glide my code freely along the spectrum of schedules when I might run it ("run-time" "compile-time" "pre-compile-time" "asset-pack" etc).

1

15:54

If there is ever a distinction between what a compiler, a metaprogrammer, and plain code are, then that can't happen. (edited)

1

I like to think of the whole spectrum of edit-build-run as one huge program.

15:55

The edit program is probably the programmer's brain

I think we are probably not actually disagreeing with one another here.

15:55

I think the problem here is terminology.

The way I'm conceptualizing this is basically this way: Sometimes I reach the limits of my language, and I prefer it that way because my language should be simple enough for me to understand well, and I should have the ability to extend the language with my own models if possible. There is no reason to assume that can't happen in a compiler (or a "metaprogrammer") also. (edited)

mp = metaprogrammer 1) mp#1 used to make a compiler 2) compiler is used in a mp#2 3) mp#2 is used to build program#1

(edited)

15:57

(1) is two stages (2) is one stage (3) is two stages but these are all separate paths.

15:57

(1) is the "asset-pack (2) is the "pre-compile-time" (3) is the "compile time"

15:59

So I am not saying we cannot have a cyclic view of (1, 2, 3). In my example above, i have mp#1 and mp#2 but they are used to produce different things

15:59

For my normal project (3), I only use mp#2, and I would state that I don't use mp#1 in that process

16:00

@ryanfleury Am I clearer now?

Hmm. So I think I generally understand what you just said, but within this model what was your original point? That you wouldn't use an mp#3 to generate an mp#4?

No. That in (3), I don't require using multiple mps (or at least I have not required one yet) (edited)

You mean you wouldn't have two different model extensions?

Let me think for a second to see if I grok what you mean

Because that seems a bit abstract, I guess I'm not sure what you mean; the line seems blurry.

Yeah I'll go ahead and say that I have no idea where we've ended up here

16:05

If I may back it up, I believe this started when Ryan threw out the idea of introspecting on more than just types.

Yeah, introspection on more than types is good. AST, entities (named declarations), expressions, constants, etc. These are all really useful things.

16:06

I think I went straight the conclusion of that.

16:06

If you are doing that introspection, you are effectively using a compiler.

Hmm okay. Yeah I was specifically saying that in response to RTTI.

I would like to interject quickly and focus on the issue with stages. Not sure if everybody is already agreeing on this, but the biggest issue with metaprogramming is when it is staged

16:08

I believe a practical build graph is not linear but DAG-shaped

I.e. tools must be able to consume language A and produce A again. (edited)

16:08

And it should be in a way that tools can be arbitrarily plugged into the build pipeline

I'll ask a question, have you used a language that has built-in RTTI support? (Rather than generating the RTTI yourself like in C(++))

There are HUGE advantages of me being able to do this:

fmt.println(foo);

and it just works on any machine with a simple command.

16:09

I don't require the user having to set up a second build stage to generate the RTTI, and it's well defined and part of the language itself.

16:10

This ability alone is extremely powerful, and has allowed me to create many powerful tools which in other instances require another build stage.

I don't know why you would assume that a hypothetical metaprogramming system requires setting up a second build stage

16:10

versus handling it itself

this seems to be a core disconnect in this conversation

Wait...

I was getting to that part

ah, okay, continue

Effectively, my vision is that your "metaprogrammer" is just a modified compiler. When it runs, it compiles the main project's code.

16:12

There is a separation between the main project's code and the metaprogrammer's code (i.e your modified compiler).

but isn't this a staged model again? I mean the separation (edited)

Yes. It's still a staged model.

So I think we should discuss picking apart several aspects of "stages".

I am assuming that the metaprogrammer is separate from main programmer

Stages could mean I hit build button 1, then hit build button 2. If I don't change anything about [1] sometimes, I can skip it.

16:16

Another aspect of stages is that they are ordered: stage 1, stage 2, stage 3

16:17

And finally there's the idea that they create groups of code that either all run together or not. Sometimes I skip stage 1, but either I skip it or I don't, there is no in between.

16:17

So for instance if the idea of staging is to avoid the problem where all of everything gets recomputed every time, then I am in agreement that staging is important.

16:18

If we're talking about me, the programmer, specifying a build process with the order of stages, manually managing the scheduling for these stages, or manually defining the boundaries between them, then I think that's just not going as far as a metaprogramming system could go.

For me, a 1 stage build system is this:

odin build foo.odin

A two stage system in this

odin build metaprogrammer.odin metaprogrammer foo.odin

This is technically still two but simplified

odin run metaprogrammer.odin -- foo.odin

And this is a two/three stage compiler

odin build metaprogrammer.odin metaprogrammer foo-custom.odin -out:foo.odin odin build foo.odin

(edited)

16:19

You could have a single stage for this, by having arbitrary compile time execution, but then the design of the language has to very specific.

16:19

i was speaking about how ANY language could work.

So for instance if the idea of staging is to avoid the problem where all of everything gets recomputed every time, then I am in agreement that staging is important.

In general I think the work to re-build must be kept small. I believe this is not really related to staging per se

16:20

If you can linearize a build process (and I hope that's possible) then you can always save work on re-build

@jfs I agree, but would also say that just means you're redefining all your heavy work as not a "build" or you're just avoiding problems that require heavy pre-baking of anything.

And again, I am not disagree with anything you are stating.

I also think we are very much discussing within the model of text -> compiler -> exe, but that isn't necessarily the only model.

I was explicitly stating the current languages.

16:23

But let's say you have the Dion approach. AST -> Editor -> Exe

16:23

It's still the same idea.

16:23

A compiler is something that transforms from one language to another,

For example, you might have:

AST / | \ v v v C C<---C | | EXE / | / RUN

16:24

Look at me making ASCII diagrams...

1

And this is my point about terminology. build.bat is not necessarily a one stage build system, it's just a trivial to use build system.

16:25

Where C is compiler, right?

Ah yes correct, I forgot to specify that

16:26

I didn't have enough room to write Compiler

So something is a "stage" in my mind if it's possible to run it and not any other stage, and it's not possible to break down into smaller stages.

AST -> C -> EXE -> Run |--> C ----------^ | ^ |--> C

To me staging means a REQUIRED linearization of the build process. A staged build is not only less flexible in terms of building the project. It also puts heavy constraints on how the software is structured

16:26

exactly @Allen4th

From that point of view "compile-time" is a stage "run-time" is a stage.

and that can be very gnarly to use

16:27

like, if you want to run some metaprogramming on that type you quickly defined in the middle of your file

For me, "run time" of the final thing is not a stage.

then you will have to move it out to a separate file to do that

There are separate questions such as what system/who is responsible for scheduling the runs of each stage.

build.bat is: the programmer schedules the run of all these at once with a single "button" and statically linearizes them.

AST -> C -> EXE -> RUN

For me that is one stage build system still. Because there is only one use of a "~C" (edited)

A two stage would be this:

AST#1 -> C -> EXE#1 AST#2 -> EXE#1 -> EXE#2

The thing that worries me about forcing the separation of stages is that when we write a metaprogram, we specify data, and in my experience data is rarely useful in only one place.

16:28

I think a lot of the time it can fall into buckets cleanly, so certainly not all data is shared everywhere, but there are some connections.

I can see why you might not want to include final run-time as a stage @gingerBill, but I think it's a useful way to phrase things. Going back to the much early talk about filling in data tables, and whether you choose to do that at compile-time run-time or some "pre-compile-time", it's clear that run-time is an option for solving problems.

16:30

So when analyzing where to solve a problem, i.e. what stage to put it in, I find it more useful to consider that on the table i.e. call it a stage.

The reason I don't include it is purely because of what people custom think of it. The run stage is not building, it is already built

16:30

In the building stage(s), the purpose is to build [compile] the program.

Right and I'm arguing, specifically, that that rigid view isn't helpful

I personally don't view it that as being at all rigid

16:31

But rather "that's how it works"

It's the idea that there's a build time that's special.

16:32

But anything you do there could be done at run time. And vice versa anything you do at run time could be done at compile time.

And vice versa anything you do at run time could be done at compile time.

And that's where I disagree.

16:32

If it could be done at "compile time", there would be no need for it to be done at "run time".

I mean it depends on your language I suppose, and if you count pre-compile-time as an extension of compile time.

I think we should use "build" here to mean "everything that needs to be done until the final result of the computation is produced". Like displaying an image on screen or whatever. Because ultimately that's what we're interested in

16:34

There are many orders of the individual steps that are needed to achieve producing the computation, but some data dependencies put constraints on the way that we can order things (edited)

The purpose of a program is to transform data into different forms of data. When creating a program, you use another "program" to create it. The code you give (be it text or AST or whatever) is your input data, you then pass it to a program (the compiler) to transform it into the output data. I don't see how any of this is "rigid". It's how all of this works

But when programmers create code to transform data, they are also creating data.

So a build system is effectively "a process that uses programs to convert input data into a built product"

16:35

@ryanfleury No disagreement there.

16:35

That's literally what I just wrote. (edited)

Well my point is that code is not inherently special and could be transformed at any stage after it is created.

It's rigid to assume that's how it has to work.

So your build system actually uses run-time as part oft he build process! @gingerBill

@Allen4th How else could it work? That's literally what computers do.

16:37

Data -> program -> Other data

16:37

Code == data

16:37

Code -> program -> other code

Yes I agree with that principle.

16:38

That's not the same thing as saying compile-time != run-time

16:38

They are both -> program ->

For me: Run Time == RUNNING OF PRODUCT Compile Time == BUILDING OF PRODUCT

16:39

Both run

16:40

And to pre-empt your next point:

Sure, I agree with that as being the terminology we tend to use.

Why not have the compile time stage run code as if it was a run time stage? (edited)

16:40

Correct?

16:41

Or.... (edited)

I mean I agree with that, but it's not exactly the point I'm driving at.

Then could you explain further?

16:41

because this is where I am confused.

16:43

Because if your framing is that "compile time" is just another form of "run time" but for your code, sure.

Sure. What I'm interested in is looking at, given any computation you want to complete: When can you do it? - You can't finish it until the input information is available How often can you afford to do it? How often do you change the inputs? Can you (and how) do you maintain cached results?

16:44

Certain computations have to be at "run time" because they are downstream of data the user will be inputing.

Sure.

Besides that there's nothing different about any "stage"

Okay.

Hence, I see each stage as essentially a cache leading up to a final result in the final stage where the user is served a particular computation.

So your argument is that theoretically, everything could be done in one swoop at "compile time", but that is useful, so we split up the "compile time" bit into different stages to aid us. Where these different stages are ran in particular orders with different frequencies.

Yeah basically.

Okay, that's what I've been saying then

16:48

Stupid English language

The only other thing I'm saying is that you could also do a lot of stuff at run time that we usually pull into the compile time.

16:48

For instance even if you don't have metaprogramming a modern language probably evaluates a bunch of constant expressions (does the computations ahead of time) rather than export the symbolic representation of the unfinished expressions for the run time.

I would agree to that too. There are different costs to take into account, and sometimes having it at run time rather than compile time is a lot better, e.g. it's better for compile times (when developing) (edited)

Flip-side: we sometimes build tables at init that will be the same every time simply because it doesn't cost much and is easier to maintain.

16:49

Anyway that's it. In a nutshell run-time is about as much a stage as everything else.

All my point was that making something that does "metaprogramming" is effectively just writing a custom compiler, which you then use to be the "compile time" stuff.

16:50

And yes, I agree to that argument too.

Yeah, I'm not disagreeing with any of that. I'm expanding on my notion that a compile-time run-time distinction doesn't strike me as being as helpful as a lot of others seem to think.

I think it's useful to know where the costs of the decisions reside. (edited)

16:51

e.g. do you generate the lookup table whilst compiling or do you generate it on the fly (e.g. at the start) when the final program runs?

16:51

There are advantages and disadvantages to both of them. (edited)

but that's introducing sort of a staging, because you have to build the "metaprogrammer" first before you can build anything that depends only on small parts of the metaprogrammer. Maybe better phrased, a separate compiler introduces a very coarse-grained staging

16:52

of course you can opt to write multiple metaprogrammers for more fine-grained staging

100% agree. I don't mean to blur the line as if it doesn't matter where you put things, I just mean to say they aught to be conceptualized as both being "just computations" that need to happen, and that there may be more ways to organize things than just the basic "build" "run"

@jfs Correct. But if the metaprogrammer is "built" and never modified (i.e. "baked"), then it's just a new compiler.

but I don't find that very practical. That's hardly "metaprogramming" anymore to me

it's just making a new language maybe

Then I 100% agree already

, Allen (edited)

That's metaprogramming, @jfs

16:53

A compiler is just something that transfers one language into another language, be it C into machine code or Java into bytecode

16:54

Generating new code from other code is effectively a new language.

16:54

It might not be something "new", but it's technically not the standard language; it cannot be used by the default compiler.

16:55

So maybe this is my general idea: Writing a metaprogrammer is writing a custom compiler Using a metaprogrammer is using a custom compiler A metaprogrammer is a custom compiler (edited)

Okay yeah I don't actually see any disagreement :)

See

16:55

Told you terminology confusion

16:56

Because this is actually quite a new field, if you don't realise it.

I agree with Allen's point that there is more than just "build" and "run", though. The editor is running in one of these stages, which is "edit time"

16:57

I don't think we disagree there, just a comment.

And in the case of the "editor", is a form of "compiler".

16:57

Which is both "compile time" and "run time" depending on the reference frame.

Right—where the human is doing the compilation (edited)

Exactly.

16:58

It's quite an amazing thing to realize once you understand all this. It's really simple, but overlooked if not known. (edited)

Yet again, we are all compressing information in our brains that is itself about compression!

Yo dawg!

16:59

We have to go deeper!

We cannot deal with all of reality, we must use and build models of reality in order interpret reality.

17:00

To interpret anything requires having a model of interpretation. (edited)

17:00

DUUUDE

17:01

And we need to realize what is relevant to the task at hand.

17:01

This is the "compression" aspect.

alright where did all the shrooms g-

17:01

BILL

The relevance realization.

not again

https://www.youtube.com/watch?v=G2jUhnCU9iA (edited)

17:03

Anyways, I am not sure if this is what you were suggesting earlier Bill, but basically the existing pipeline of programming environments requires us to "bucket" our code into these stages, and I think that is potentially harmful.

17:03

And I think this is what is often meant by letting code "flow" between compile time and run time.

I personally don't think it is harmful, necessarily, but important to note where things lie.

But I also think that compiling code shouldn't imply it cannot be used in other stages.

17:05

Like if I compile some code that runs for some entities in my game, maybe I can select those entities and see the code being rendered as it runs. (edited)

17:05

I mean in general this extends to debuggers in general.

Here's a simple example:

odin build foo.odin -out:res.exe res.exe foo.odin

or

odin run foo.odin -- foo.odin

17:06

You are passing your original code into the program that built it.

17:06

Now that's meta

Haha, yeah I think that is what I mean, basically

17:08

So I just think that even if the pipeline looks like this: AST -> Compiler -> EXE It very well might actually look like this:

_________________ / v AST -> Compiler -> EXE ^ ^ \_______/

17:10

I've really got to stop it with these ASCII diagrams.

Not to mention that if your editor is pretty close to the AST and Compiler then you can add in:

AST <-> Human ^ ^ | / v v Compiler

2

1

17:12

I don't know, I could go for some more ASCII diagrams in my life personally.

Yeah... I don't know that I can justify making them given the fact that I have a drawing tablet on my desk also :/ (edited)

Ultimately a build process is something that you should be able to describe as a DAG, with a fixed and well understood number of build steps as vertices, @ryanfleury

So if there is recursion between AST and Compiler in your diagram, I think that means that you could split up AST into smaller partitions. I.e. AST is not a proper build step.

17:21

it absolutely should be a pipeline. Or well, a DAG. I guess that can be viewed as a pipeline too. Using topological sort

I guess the thing that throws me off is that it isn't really a "build process", it's just a "computation pipeline". What we describe as "building" in C is really just a certain subset of computations. (edited)

Right. I think of it as some ultra data-oriented parallel pipeline in a game engine or something, where you have some tasks with some dependencies, and you make a dependency graph to solve for what can be done in parallel

17:23

In principle it is no different, really, it's just the classes of computations you're dealing with in compilers/metaprogramming is different than those you are dealing with in a game.

17:23

But it's still fundamentally a set of data transforms that depend on the output of some other data transforms.

Yes. As is all computing

My resumee of sorts is that metaprogramming should be done in a way that doesn't put too many constraints on the ordering of the build steps. Because that might require rearchitecting the software project to accomdate new metaprogramming (edited)

I'm not sure you'd really describe it as a DAG

there might be cycles, certainly

Everything is a DAG @.bmp . Your life is a DAG

haha I see what you mean though, you can decompose the "conceptual" graph into an acyclic one if you're strict about discrete steps

There might be a cycle in your code, but each time you execute the same instruction again, you're doing (hopefully) new work. That's the sense by which I mean "it should be possible to describe your build as a DAG" (edited)

17:27

so in a sense that is a truism and not very interesting

17:27

What I mean though is that it should be a DAG that is meaningful and insightful to humans

17:28

you should be able to understand precisely how the build goes (edited)

17:29

yep

17:35

Another way by which you can turn a perceived loop between 2 or more things into a linear sequence is by looking at them as a single unit. I.e. you can zoom in and also out.

End of Fishbowl Day 2, thanks everyone for participating! If you have an idea for a Fishbowl conversation, throw it in #network-meta or DM me!

If you want to read the whole conversation from the beginning, check the pinned messages!