Thanks for all the interesting thoughts on this.
Part of what make type systems hard to design is putting the line between convenience and safety.
Yes this is definitely an interesting problem. To go further on this, one thing I often feel is odd is the way that type systems are sometimes used to express properties of the values they contain, but in ways that feel kind of coincidental. For example imagine you want to store a percentage as an integer from 0-100. Would you use an unsigned type? You could do, and by doing so, indicate using the type system the lower bound of the valid values of that variable, but the type system doesn't give you away of specifying the upper bound in the same way.
That said it is impossible for any compiler to divine what the optimal implementation of a data structure would be for your use case.
Exactly. So my follow up question to this would be that given that the user of the data structure will have to make the decision themselves, how is that best presented? To take the world of "dictionary" like data structures as an example, in C++, you have to pick differently named containers for different implementations. So if you want a red-black tree, you choose std::map, if you want a hash table, you choose std::unordered_map, if you want a sorted array, maybe you choose boost::flat_map etc. This feels like a slightly uncomfortable middle ground. The difference between these types is in the implementation, but the names don't exactly make this explicit, and instead try to emphasise differences in their semantics/behaviour.
Personally, I love low level programming, so I just want a language that treats all variables like arrays of bits.
Like v for variable, and then number for the number of bits it contains. So v32 gives me 32 bits. v7 gives me atleast 7 bits. v87 atleast 87 and so on. Then -1 just is the same as 511 in a v9.
I'm really interested by this perspective, and I'd like to ask some follow ups. First, how would you see this interacting with the other aspects of types, like their data format and the operations that can be performed on them? If all the compiler knows is the size of the data, then presumably something like
wouldn't work, because the compiler wouldn't even know what integer encoding you were using, or what endianness you wanted, right? So how would you communicate things like data format and operations to the compiler? Also how would you expect variables of different sizes to interact, say for example adding a v32 to a v9?
I'm also interested by the fact you say "at least 7 bits", presumably suggesting you'd be happy for the compiler to pad that as necessary for more optimal alignment? Given that there would be that level of slop in the actual sizes of the variables then, what is the advantage of such fine grained control? For example what would be the advantage of being able to ask for a v7, which on most modern computers would presumably be padded to a byte, over say just specifying a Rust-style i8/u8, which you would still know was big enough?
Variables are basically a convenient abstraction over registers and the stack which is absolutely fundamental to programs being able to be cross-platform - types provide automation on top of that, and potentially safety guarantees beyond that
I think this is an excellent point, and also raises a really interesting question. One of the funny things I find about variables is that they don't actually exist, right? As in, in some ways, variables are like a way of pretending you have infinite registers, but of course in fact, a variable may be many things, and indeed may change during the lifetime of the program. It may be some space reserved on the heap, or on the stack, or be a register, or if it can be deduced at compile time, it may even disappear entirely into an instruction. Whilst I think this abstraction works well with many types, like integers and floats, I think vector types have highlighted a rather interesting problem with this abstraction, in that the registers you want to use may no longer correspond to the way you want to think about your data in memory. For example if I want to multiply all the floats in one array with all the floats in another, I probably want to load them up four at a time into xmm0/1 and multiply them as a vector operation, but equally I probably don't want to think about those array of floats as arrays of vec4s, if each float is a separate thing in the array.
If you provide the lowest common denominator you appeal to the largest target audience.
This is definitely a good philosophy to have, but depending on what your library is doing, there is only so far you can take this, right? As in, if what you're writing is a container library, you kind of have to make some of these decisions. I definitely agree with the idea of making sure your library is as flexible as possible with regards to the things it is not
implementing though (e.g. memory allocation).