When do libraries go sour?

Welcome everyone to another fishbowl. Our topic for today is responsible code reuse, specifically as it relates to library usage. Some points we might cover today: * When do you reach for an existing library? What do you look for in a library? * When do libraries go sour? When does using a library become irresponsible? What are the telltale signs of issues with a library? * How to evaluate a library before using it. * When making a library for others, what do you do to ensure it can be used responsibly? * The community's general stance regarding using libraries vs DIY. * What does all of the above mean for package managers and today's culture around library usage? Participants: @NeGate @demetrispanos @raysan5 @ryanfleury @AsafGartner

10:02

We'll start with a question from @ryanfleury:

Which libraries do we use in our day-to-day projects? Why? Could we replace them easily if we needed to?

Hi everyone! The reason I wanted to start with the concrete is so that we can focus on real cases where the traditional form of code reuse (using libraries) worked out well, because I've seen the abstract route of the discussion be a bit more unproductive. Instead we can see what has been both responsible and useful, and build up the more abstract rules from there.

I use raylib with most of my projects and raylib only uses single-file header-only libraries, most of them are the stb libraries. (edited)

One example I can give is pgx: https://pkg.go.dev/github.com/jackc/pgx/v4 It's a postgres library for Go that we use for the Handmade Network website. The main thing that it solves for us is a knowledge-acquisition problem. It talks to postgres over its binary protocol, which to the best of my knowledge is undocumented. So doing it ourselves would require reverse engineering libpq (the official C library).

docs for postgres binary protocol: https://www.postgresql.org/docs/current/protocol.html

For me personally, I've had the most success with "data-in, data-out" libraries - a few examples of which are well-known in the community. stb_image, meow_hash, stb_truetype

10:05

(They are much easier to evaluate when they are open source like this)

for Cuik, it's mostly just stbds (im phasing it out) then TB has luajit, and tbbmalloc (which is also wanna phase out lol) (edited)

10:06

for game projects it's stb_image, FreeType, sokol, raylib, cgltf and a few others (edited)

for me, common ones are stb_image, zlib, sqlite, lua/luajit (though not sure if this should "count" even if embedded), and then various specialist scientific computing libraries e.g. fftw

AsafGartner

One example I can give is pgx: https://pkg.go.dev/github.com/jackc/pgx/v4 It's a postgres library for Go that we use for the Handmade Network website. The main thing that it solves for us is a knowledge-acquisition problem. It talks to postgres over its binary protocol, which to the best of my knowledge is undocumented. So doing it ourselves would require reverse engineering libpq (the official C library).

Interesting connection here, one important aspect of code is the fact that it captures knowledge. It's not explicit, direct knowledge - it's baked down quite a bit - but the knowledge is really the most useful part. So for stb_image, the useful part is not quite the API (you could imagine tweaking it), it's rather that decoding a PNG in a fairly good way is difficult enough that unpacking all the knowledge is fairly high-effort. Same with meow_hash or for that matter any other hash - producing a hash that has fairly good results for a given problem is a difficult process that requires expertise and care, and it's often quite orthogonal to the problem in question

ryanfleury

For me personally, I've had the most success with "data-in, data-out" libraries - a few examples of which are well-known in the community. stb_image, meow_hash, stb_truetype

You also refer to them as "leaf libraries", right?

AsafGartner

You also refer to them as "leaf libraries", right?

Yeah. I don't think libraries are only possibly useful as "leafs", but I think they have a higher chance of being useful when they are, if that makes sense.

10:08

Notably because as you get "closer" to the "knot" of a codebase, you have to start making more assumptions. Leaf libraries are in the privileged position of making very few assumptions - "here's the data transform for this problem. We studied it and know it well - pass in the source data, and we'll use our knowledge to compute the output data"

is the leafs concept related to Casey's components? or how do they differ?

I'm not sure, actually. Maybe someone else here knows?

It's interesting that we came up with the same terminology.

10:09

To me leaf just means that it's replaceable in a bounded way without affecting the rest of the tree. (edited)

Right yeah. That is maybe a more succinct way of putting it :)

a related concept I use, similar to "leaf", is whether or not using the library forces me into using their type system

4

10:10

if the library just accepts arrays of primitives (as an example) then any library could do that

AsafGartner

To me leaf just means that it's replaceable in a bounded way without affecting the rest of the tree. (edited)

so a replacement would need to be compatible only in terms of API? it doesn't necessarily imply anything about the size correct?

demetrispanos

if the library just accepts arrays of primitives (as an example) then any library could do that

For sure. I think that this also becomes much easier as an API designer when it's a leaf-like scenario

I wouldn't even say that it needs to be compatible. It's just that it doesn't force deep changes in the entire codebase.

10:11

(Not counting find/replace of a typename or simple things like that)

demetrispanos

a related concept I use, similar to "leaf", is whether or not using the library forces me into using their type system

this is weird spot in C because the type system is limited enough that everyone and their moms must redefine a slice or other common structures (edited)

10:12

basically library design is sorta held back by the limitations of the language

It's partly limitations of the language and it's partly the language not being opinionated about very common things. So for example, in Metadesk and the debug info stuff we did at work, the library defines a "string type" (just a ptr + size), and then to start parsing, you have to package up your primitives in that form. You can obviously define an API that just takes a char * or something of the sort, but then that function necessarily becomes non-composable with the rest of the library, which uses the length-based string type.

NeGate

this is weird spot in C because the type system is limited enough that everyone and their moms must redefine a slice or other common structures (edited)

personally I rarely have a problem designing library interfaces that only rely on primitives, or on extremely shallow structs

2

10:14

if you do something DOD-like or relational-like to model your problem, you can work very close to the primitive type level

so usually i just avoid passing around slices in my libraries because it's weird, i'll just unravel it

C APIs are usually clunky in this way - e.g. the user must first know to call NeGate_MakeSlice(...) but I think the composition effects are worth it

10:16

Otherwise you get these weird "user-facing" APIs floating around, that you eventually want to reuse, that aren't using the composable types that the rest of the library knows to use

10:17

(And this is tied in with another API design aspect, which Casey has spoken about before, which is that an API consists of low-level, composable, granular APIs that are precise, and higher level APIs that bundle functionality together, but all of those APIs are user-facing - there aren't "internal" APIs, for the most part)

10:17

Anyways I suppose this is going down an API design rabbit hole, which is maybe a bit off-topic (edited)

ryanfleury

Anyways I suppose this is going down an API design rabbit hole, which is maybe a bit off-topic (edited)

yeah I agree we shouldn't dig too deep on library design as such, but I think it highlights some of the aspects that contribute to code reuse decisions

1

so Ryan, wrt "Could we replace them easily if we needed to?", i think we've reached a spot where most of our foundational libraries probably can't (sometimes won't) be replaced by us but there's also a lot to be said that we don't necessarily need all of stb_image to get work done

So I guess we could talk about when using a library goes sour, since that connects with @demetrispanos' point regarding a library forcing you to buy into a type system - I have personally had issues with libraries that expose either an incomplete set of extracted information, and furthermore (sort of a subset of this problem) if they provide only high level APIs that let me get at certain data. A good example is a query-style API vs. a "report an entire batch" API. SearchPoint(...) vs. GrabArray(...). In many cases, you need the latter, but many libraries fit a high-level use case too early with the former, and it makes the library sometimes unusable for a given problem

10:20

Oh, we spun up two threads

NeGate

so Ryan, wrt "Could we replace them easily if we needed to?", i think we've reached a spot where most of our foundational libraries probably can't (sometimes won't) be replaced by us but there's also a lot to be said that we don't necessarily need all of stb_image to get work done

Hmmm so what do you mean by "foundational libraries"?

stuff like stb_image or stb_truetype really

10:21

like most people dont know or care about the details inside of stb_truetype, they just use it and avoid the garbage world of font rasterization

10:21

which is fine imo

With stb_truetype it becomes inadequate pretty quickly, I think - I haven't had as many issues with stb_image I suppose (although for some cases it also is inadequate) (edited)

sure

10:22

but rarely does anyone replace stb_truetype with something that isn't a bigger and weirder library

NeGate

but rarely does anyone replace stb_truetype with something that isn't a bigger and weirder library

like for all the text editor memeing we've had around here, it's rare for someone to make an stb_truetype replacement, they just use FreeType or something

stb_image is a low-cost, decent-gain library. Not sure about stb_truetype.

10:22

Loading images isn't usually a program's main thing.

NeGate

like for all the text editor memeing we've had around here, it's rare for someone to make an stb_truetype replacement, they just use FreeType or something

but FreeType2 is a very big library...

exactly on both of those statements

Text rendering is another issue where there's a big knowledge acquisition gap.

One point I wanted to mention that's related is the idea that code is always written given a set of assumptions (or constraints). Properly analyzing a library requires understanding those assumptions. So for example, stb_image assumes that you trust your input data. (Same with stb_truetype). They assume a few other things (which are related to our earlier discussion on API design), and basically libraries become more problematic when the diff between the library's assumptions and your assumptions grows large.

10:25

Furthermore I think this gets at one aspect of the modern code reuse culture - if you pull in a gigantic DAG of libraries just by trying to grab one, you've introduced a gigantic bundle of assumptions that you have not vetted, and basically cannot vet (edited)

Yes. And that diff tends to grow larger as your project grows. (edited)

1

it's ok to have assumptions, it's just that it's also ok not to pick a library because yours dont line up with theirs

ryanfleury

One point I wanted to mention that's related is the idea that code is always written given a set of assumptions (or constraints). Properly analyzing a library requires understanding those assumptions. So for example, stb_image assumes that you trust your input data. (Same with stb_truetype). They assume a few other things (which are related to our earlier discussion on API design), and basically libraries become more problematic when the diff between the library's assumptions and your assumptions grows large.

well, I'm afraid that's the price you pay for the library (edited)

raysan5

well, I'm afraid that's the price you pay for the library (edited)

sometimes you'll either compromise or pull a handmade or something

10:26

but yea this will always be true in a sense

NeGate

it's ok to have assumptions, it's just that it's also ok not to pick a library because yours dont line up with theirs

Yeah it's not only okay, it's required - you cannot write code given no constraints

1

10:28

On this subject & connecting with Asaf's original prompt, I think clearly documenting the assumptions you make for a given API is probably a good place to start, w.r.t. making a library easily responsibly used. (And flipping it, when you're a user, trying to fish out the assumptions, and clearly documenting your project's) (edited)

ryanfleury

Furthermore I think this gets at one aspect of the modern code reuse culture - if you pull in a gigantic DAG of libraries just by trying to grab one, you've introduced a gigantic bundle of assumptions that you have not vetted, and basically cannot vet (edited)

yea this is a big problem when it comes to threading especially in the types of languages we work in, like in our lovely imperative languages you need to understand data flow and mutability to be able to actually schedule things to happen in parallel, but once you lug around 20 libraries it only takes one of them being nasty about data flow for you to lose a bunch of potential performance gains

10:29

if Foo can't be called while Bar is being used because they both initialize a table but not atomically or something... you can get fucked over that easily until you put a mutex over it and cope (edited)

NeGate

yea this is a big problem when it comes to threading especially in the types of languages we work in, like in our lovely imperative languages you need to understand data flow and mutability to be able to actually schedule things to happen in parallel, but once you lug around 20 libraries it only takes one of them being nasty about data flow for you to lose a bunch of potential performance gains

For sure

ryanfleury

On this subject & connecting with Asaf's original prompt, I think clearly documenting the assumptions you make for a given API is probably a good place to start, w.r.t. making a library easily responsibly used. (And flipping it, when you're a user, trying to fish out the assumptions, and clearly documenting your project's) (edited)

i like the idea of making certain things the default: * functions are pure (given the same input, it produces the same output) * you must pass in a non-NULL pointer * functions which take in isolated inputs are therefore thread-safe. these are some simple ones but you can probably imagine more, and whenever things don't match this we must document it. (edited)

1

On "assumptions" - when I was iterating on The Melodist and pulling out things I wanted to reuse in later projects (not as libraries, but as tools within my codebase), I wanted to reuse my operating system abstraction layer. The problem is that my entire operating system abstraction layer was written with the assumption that I always opened one window, and initialized a graphics API. But that assumption makes the layer unusable for, e.g. a simple terminal application, or a multiwindow application. So that makes that "library" unusable for those problems, and if I had somehow hacked around it instead, that would've been less responsible in my mind.

So that makes that "library" unusable for those problems, and if I had somehow hacked around it instead, that would've been less responsible in my mind.

I find that forcing libraries to fit your code (or vice versa) tends to happen quite a lot later in the project.

10:34

It's something that people should pay attention to when vetting a library, but it's not something that's often talked about.

1

Oh, this touches on yet another thing - sorry, I'm sort of throwing all of my thoughts in and hoping one will stick - "code reuse" =/= "library that you cannot edit". There is another form of code reuse that arises, which is "I wrote a renderer in my last project, but I want to have a renderer in my new project - instead of trusting my old assumptions, or throwing the old renderer away, I can just duplicate it and mutate it as needed". This can be an invaluable time-saver, without the common drawbacks of overly-assumptive libraries. (edited)

2

ryanfleury

Oh, this touches on yet another thing - sorry, I'm sort of throwing all of my thoughts in and hoping one will stick - "code reuse" =/= "library that you cannot edit". There is another form of code reuse that arises, which is "I wrote a renderer in my last project, but I want to have a renderer in my new project - instead of trusting my old assumptions, or throwing the old renderer away, I can just duplicate it and mutate it as needed". This can be an invaluable time-saver, without the common drawbacks of overly-assumptive libraries. (edited)

the problem I see with external libraries that you need to edit for your project is maintenance (edited)

raysan5

the problem I see with external libraries that you need to edit for your project is maintenance (edited)

for this I think it's worth somehow distinguishing evolving vs stable libraries

10:36

there isn't good terminology for this, but there are libraries like stb_image where you can use a fixed version for years and it almost never matters

10:36

(rare exceptions if you're using untrusted inputs)

10:37

there are others (anything that touches the network, probably) where you want to keep tracking the latest version

demetrispanos

for this I think it's worth somehow distinguishing evolving vs stable libraries

libraries can always change or need to change for specific projects needs

10:38

I mean, stability of a library is very relative

Yeah, one third-party library I forgot to mention that I use and modify is stb_sprintf (another leaf!). I added my own format specifiers for my own string types. But that library almost never changes. If I did want to upgrade, it would indeed be slightly painful, but it wouldn't take very much actual time or resources - and it would be very low amortized across time (because the library changes so infrequently) (edited)

You can't know what will be added/fixed in the future, so you'd generally expect the library to be good enough as is, no?

1

AsafGartner

You can't know what will be added/fixed in the future, so you'd generally expect the library to be good enough as is, no?

this is not my experience with how most people use libraries

10:38

I think most people use libraries with the specific intent to benefit from rolling updates

10:39

"it keeps getting better, and I don't have to do anything"

And then there's no adaptability.

10:39

Or more accurately, there's explicit anti-adaptability.

1

demetrispanos

"it keeps getting better, and I don't have to do anything"

I imagine it depends on the type of library and the functionality provided, for example stb_image functionality is very focused, chances to have problems with it are low (with trusted input data)

I think that does depend on the style of library - with certain pure-functional data transforms on rock-solid, unchanging problems - e.g. parsing a well-established data format, doing a mathematical transform, producing a hash - the upgrades will be more tangential (performance, reliability, etc.) (edited)

raysan5

libraries can always change or need to change for specific projects needs

sometimes you end up where you make changes into the internals of a library and since it's not in the trunk then you're kinda fucked even if just small changes comes by just because modifying inside of a library doesn't have the same stability guarentees as the API, this is where i'd just say "lmao dont update your tools" or "you might wanna consider just branching off into your own thing if the library is already doing what it needs to do"

1

NeGate

sometimes you end up where you make changes into the internals of a library and since it's not in the trunk then you're kinda fucked even if just small changes comes by just because modifying inside of a library doesn't have the same stability guarentees as the API, this is where i'd just say "lmao dont update your tools" or "you might wanna consider just branching off into your own thing if the library is already doing what it needs to do"

you dont need to completely drop the library at this point, it's sour enough for you not to continue in their updates but still keep your modifications and the library... mostly

demetrispanos

I think most people use libraries with the specific intent to benefit from rolling updates

this is true but in practice that promise isn't always kept, sometimes things get deprecated and after a while they just get removed

1

Just for concrete reference, @bvisness and I had to modify a markdown library because of an issue when compiling it to WASM. It was literally not workable without the change.

1

I agree with @demetrispanos, though, that this is how most people use libraries, and I think that is one place where the "Handmade narrative" explicitly disagrees with the general programming culture, at least in my mind. If code inside of your project is swept out from under your feet, and you have not tested with it, then in my mind it's quite unethical to ship that to users. Even if it's ostensibly "an improvement", that is really just a prediction, and that prediction is often wrong. (edited)

1

10:44

So connecting with what @AsafGartner said w.r.t. "adaptability", I would posit that upgrades should always be explicit & directly adapted for, in a given project

yeah this is my point really, obviously I'm aware of the ways it can go wrong but this is by far the dominant mode of use

1

This is one reason why I'd suggest single-header libraries are so popular in the Handmade sphere also, because they fit this pattern of use - explicit upgrades (edited)