GIF Exporter / API design quandries

I'm currently trying to polish up my realtime gif exporter into a high-quality single-header library, and because this is the first time I've been really serious about publishing a library with a stable API, I'd like to get some scrutiny of the interface and ask for opinions on some things. Here is the header in its current state:
https://github.com/notnullnotvoid...b1fcea8f9dcf5343e4a03dc/msf_gif.h
...and here are the questions I have:

  • Would it be worth it to allow the user to specify a pitch (in bytes) between pixel rows in the API? It would be trivial for me to implement, and could be useful for natively supporting more use cases or allowith the user to encode a subrect of a buffer without having to memcpy things around, but it would also be yet another parameter for library users to fill out, which maybe very few would actually use. It would also make the `upsideDown` parameter technically redundant (since the user could just pass in a pointer to the last pixel row and a negative pitch), in which case should I remove that flag, or keep it for convenience?
  • Right now, I have the user specify frame timing as an integer number of centiseconds, because that's what the GIF format supports. However, I've considered allowing them to pass in an integer number of milliseconds or a floating point number of seconds instead, and accumulating the timing error in the library - e.g. if they pass in 16 milliseconds, the library spits out a frame that lasts 1 centisecond, and saves the 6 leftover milliseconds to add to the next frame's timing, etc.
    This could make things a bit easier for library users, but I'm worried it might push them toward naively passing in measured frame timings directly, which would result in more jittery gifs - especially if the framerate is inconsistent, since passing in the time it took to render a frame is actually a subtle bug - the library expects to recieve the amount of time that frame should be displayed for, which is actually roughly equal to the time the next frame takes to render.
    The best solution, in my opinion, is to target a constant framerate that evenly divides into centiseconds, and save off frames whenever you surpass a new multiple of that target frame time. I think exposing the timing as centiseconds will make people more likely to implement this better option, especially if I include example code that uses this strategy, but of course it's also asking more work of the library user. Then again, if they can just copy the logic of the example code, maybe it's not asking much after all?
    So, is it a good idea to hide this quirk of the gif format, or should I force users to reckon with it?
  • The non-incremental API currently doesn't allow specifying per-frame timings. Given the aforementioned tradeoff between giving people maximum flexibility vs. encouraging use of constant frame times, should I change the API to have the user fill out an array of frame timings alongside the array of pixel buffers?


Those are all the things I need to decide on soon in order to finalize a first version of the API. However, there are some other things I've been thinking about that could come up later down the road:

  • Should I support custom memory allocators (to replace the library's use of malloc, realloc, and free)? If so, how? There are a couple that seem to severly limit the usefulness of custom allocators in this situations: First, the total amount of memory allocated over the course of encoding an entire gif is quite high (a bit more than 4*width*height*frames bytes), and there is at least one allocation (the quantized version of the previous frame, used for delta compression) that needs to persist between calls to msf_gif_frame(), so simple allocators (e.g. lifetime-based arena allocators) are unlikely to be useful. Second, the non-incremental API is multithreaded, so any allocator it uses would need to be thread-safe as well. At that point, whatever you're replacing malloc with is basically just another malloc implementation... so I'm not sure how useful that is.
  • I've also considered adding an asynchronous API that defers the work to a background thread. The current (synchronous) incremental API is fast, but could still be slow enough to cause problems. Of course library users can do their own multithreading to take care of this, but it's a common enough concern I figured it would be convenient to have the library handle it if possible. However, this would even further complicate allocator usage patterns. Because you'd no longer know when the framebuffer data you pass into the library is done being used, the library would either need to somehow communicate back to the main thread which buffers are ready to be freed, or just assume the buffers were allocated by malloc (or whatever custom allocator you're replacing it with) and free them for you.
  • Finally, would it be useful to provide a version of the API that writes to a block of memory instead of a file? I can think of a couple use cases where this might be nice, but in general I worry about adding too much Stuff™ to the API of a library that should ideally be dead-simple and really easy to use...
  • Am I massively overthinking everything?


Of course, any other comments or critiques urelated to the things I mentioned are welcome as well.

Edited by Miles on Reason: Initial post
notnullnotvoid

  • Right now, I have the user specify frame timing as an integer number of centiseconds, because that's what the GIF format supports. However, I've considered allowing them to pass in an integer number of milliseconds or a floating point number of seconds instead, and accumulating the timing error in the library - e.g. if they pass in 16 milliseconds, the library spits out a frame that lasts 1 centisecond, and saves the 6 leftover milliseconds to add to the next frame's timing, etc.
    This could make things a bit easier for library users, but I'm worried it might push them toward naively passing in measured frame timings directly, which would result in more jittery gifs - especially if the framerate is inconsistent, since passing in the time it took to render a frame is actually a subtle bug - the library expects to recieve the amount of time that frame should be displayed for, which is actually roughly equal to the time the next frame takes to render.
    The best solution, in my opinion, is to target a constant framerate that evenly divides into centiseconds, and save off frames whenever you surpass a new multiple of that target frame time. I think exposing the timing as centiseconds will make people more likely to implement this better option, especially if I include example code that uses this strategy, but of course it's also asking more work of the library user. Then again, if they can just copy the logic of the example code, maybe it's not asking much after all?
    So, is it a good idea to hide this quirk of the gif format, or should I force users to reckon with it?



Honesty is very important in both your own code and in API design. If you try to hide the quirks it would only lead to glitches and confusion.
You should explain why centiseconds are used in the documentation, and try to make it easier to work with centiseconds if necessary. For example, you could add a TimeAccumulator struct and a function that helps you work with that.

For the incremental API, maybe it would make sense to provide the time since last frame instead of the duration of the current frame? (You would then need to provide the duration of the final frame in msf_gif_end)

What happens if the user never calls msf_gif_end? (Maybe the program crashed) Would the file be usable?

Edited by Asaf Gartner on
I don't have any knowledge about API design.

About the pitch: If I can pass 0 when when I don't care about a parameter and the library uses the default value then I think there isn't much effort needed from the user and that works for me. For the upsideDown flag I'm not sure. If the documentation explain how to flip the image using the pitch it might be ok, but then you need to compute the right pointer (end + pitch, if pitch is negative) and it's seems annoying to do and easy to get wrong (but not hard to debug). stb_image uses a function call (stbi_set_flip_vertically_on_load) to set the upside down flag. Maybe that would work ?

About frame timing: I agree with AsaG, the documentation should explain why the library uses centiseconds and how to work with it. But I see two use cases: a developer wanting to make a gif of its game/application, in which case it's ok to to run a game at 100hz or 50hz without vsync; or a developer wanting to let player/user make gif while they are playing the game in which case you want the game to run at 60hz and the gif timing would be off (or use 1 frame every 3 frame, 20hz). But maybe in that case another external application would be better suited.

notnullnotvoid
The non-incremental API currently doesn't allow specifying per-frame timings. Given the aforementioned tradeoff between giving people maximum flexibility vs. encouraging use of constant frame times, should I change the API to have the user fill out an array of frame timings alongside the array of pixel buffers?


As for the pitch, if you can pass 0 and in that case use the same timing for all frame it works for me.


About memory allocators: if it's simple enough, having a way to specify custom allocators would be nice for people wanting to compile without the C Run Time library. Similarly outputting to memory instead of file would remove the need for stdio.h and the CRT (assuming you don't use it in the implementation), and let user use their file API of choice. I never actually compiled without the CRT so maybe it's something you would do in the futur if peoples ask for it ?

Maybe you could try to ask Sean Barrett and/or Casey to have a look at this thread ?
Structures in headers that might be expanded later always make me feel uneasy. If someone gets a new version of the library with additional data members and forgets to update the header, it can lead to crashes from not having allocated enough memory in the stack. I can easily see a thread-pool being added to the state after a few years. I would just hide the "incremental API" with its exposed types and go for the "all-at-once" API. I don't really see a use case for streaming live video in a gif image and most linear containers can easily be converted to a raw array. This would make it a lot easier for users to understand the API if you are willing to sacrifice a little redundant functionality for the sake of simplicity. It's much easier to add the incremental API later when someone actually wants it than to remove it after nobody used it.

"MSF_GIF_HPP" could be made a little longer in case of naming conflicts. It's also named "HPP" even though the file uses ".h" for C/C++ portability. Personally, I never cared for *.hpp because it just adds confusion.

More usage information and examples would make the documentation easier to understand top to bottom. If it only has technical details, it's like getting a PCB pattern for a computer part without knowing if it's a graphics card or motherboard. "msf_gif_begin" does not say that it is writing if someone's just skimming the calls without looking it up in this external header's documentation (from the caller's perspective).

"rbits, gbits, bbits" looks like bit-masks of undefined size for uint32_t pixels. Changing the size of int with a new compiler or target can change the effect of a bit-wise negation and cause problems in complex compression algorithms that you don't want to revisit on a bad day.

Use of "bool" can be a pain across different languages with different non-zero definitions of true. Just something to think about if you plan to release a future dynamic link library without changing the API. Defining it as an integer and saying that non-zero means true will be more beginner friendly for those who don't know that a bool is an integer under the hood with all its pitfalls. It can also serve for additional states without breaking backward compatibility. I lost count on how many boolean arguments that turned into different modes and changed an API at a much later time.

Any exposed types of undefined size would be a usability problem in a future dynamic link library, because the ABI might change depending on which compiler was used and the other language may have different names for the types. If you decide to never link dynamically, then it's not really a big deal because the types will have the same names within C/C++ with the same compiler.

"const char * path" will not work from folders with eastern symbols unless you specify that it takes multi-char UTF-8 encoding. UTF-32 is a lot easier to work with in algorithms by not having complex pattern matching to find a symbol, but you will still have basic combination characters.

Edited by Dawoodoz on
Dawoodoz
If someone gets a new version of the library with additional data members and forgets to update the header, it can lead to crashes from not having allocated enough memory in the stack.

Because the library uses the same header as the rest of the program, you'd get compile errors if the files get out of sync. It's especially not a problem in this case because the library has been made single-header, so the header and implementation are in the same file. Having the state struct in the header portion allows it to be allocated on the stack. That said, the library already makes some heap allocations, so one more isn't a big deal. It could be worth returning an opaque pointer from msf_gif_begin() to avoid having any implementation details in the header, as a matter of cleanliness/user-friendliness.

Dawoodoz
I would just hide the "incremental API" with its exposed types and go for the "all-at-once" API. I don't really see a use case for streaming live video in a gif image

The incremental API only requires keeping one frame in memory at a time, so its memory usage is bounded and relatively small. Keeping all frames in memory at once for a particularly high-resolution, high-framerate, and/or long gif can easily eat up gigabytes of memory. Also, the incremental API allows spreading the work over multiple frames, meaning you can easily encode the gif as it's being recorded in a single-threaded program, without taking a huge frame hitch when the recording ends. Finally, the incremental API is simpler, and the behavior of the all-at-once API can be emulated using the incremental one anyway, if needed. The ONLY advantage of the all-at-once API is that it can be more aggressively multi-threaded. In all other respects, it is strictly inferior.

Dawoodoz
"MSF_GIF_HPP" could be made a little longer in case of naming conflicts.

In what way would this help? Just making it arbitrarily longer isn't going to do anything. Uniqueness is what matters for name conflicts, not length. The only way there will realistically be a name collision is if there is another header also named "msf_gif.h" in the same codebase, which is extremely unlikely and if it happens, they can just rename the include guard.

Dawoodoz
Personally, I never cared for *.hpp because it just adds confusion.

.h is for C-compatible headers, .hpp is for C++ headers. What's confusing about that?

Dawoodoz
"msf_gif_begin" does not say that it is writing if someone's just skimming the calls without looking it up in this external header's documentation

If someone manages to introduce a bug in their code because they couldn't be bothered to read a few lines of documentation (which is right there inline with the function definitions they're referencing) and blindly assumed that a function which takes a file path string as a parameter, in a library whose sole functionality is writing files to disk, does not write anything to disk, that's on them. There's nothing I can do.

Dawoodoz
"rbits, gbits, bbits" looks like bit-masks of undefined size for uint32_t pixels. Changing the size of int with a new compiler or target can change the effect of a bit-wise negation and cause problems in complex compression algorithms that you don't want to revisit on a bad day.

In this case they're numbers of bits rather than bitmasks, and aren't used during compression.

Dawoodoz
Use of "bool" can be a pain across different languages with different non-zero definitions of true. Just something to think about if you plan to release a future dynamic link library without changing the API. Defining it as an integer and saying that non-zero means true will be more beginner friendly for those who don't know that a bool is an integer under the hood with all its pitfalls. It can also serve for additional states without breaking backward compatibility.

Any non-zero value will be handled correctly by the library, and I completely don't buy the idea that having an integer argument for what is conceptually a boolean would be somehow more beginner-friendly than using a boolean argument for what is conceptually a boolean. Also, the possibility of adding additional states is mutually exclusive with treating any non-zero value as true.

Dawoodoz
Any exposed types of undefined size would be a usability problem in a future dynamic link library, because the ABI might change depending on which compiler was used and the other language may have different names for the types.

Can you be more specific? In what circumstances would this be a problem?

Dawoodoz
"const char * path" will not work from folders with eastern symbols unless you specify that it takes multi-char UTF-8 encoding. UTF-32 is a lot easier to work with in algorithms by not having complex pattern matching to find a symbol, but you will still have basic combination characters.

Setting aside the questionableness of calling non-ASCII text "eastern symbols", that's a good point. On reasonable operating systems it's fine because the filepath gets passed straight to fopen() without modification, but on windows fopen() doesn't handle UTF8 text, so I'll probably need to add a platform-specific code path there to handle UTF8->UTF16 conversion. Using UTF32 here wouldn't make any sense.

Edited by Miles on