Tweet With C Blog

Andrew Chronister — 8 years, 6 months ago

[Continued from part 1]

If you've read my release post, you may remember that TWC heavily relies on code generation to allow for rapid updates in response to Twitter API changes. To refresh your memory, here's the basic idea:

A script runs over the Twitter API documentation website to gather a schema of all of the Twitter API endpoints. This schema includes information about the URI, name, description, and inferred type of each endpoint.
A set of templates ("pattern", not C++ template) is defined in addition to the rest of the library framework code. These templates specify how a given type is serialized out to a string or cURL parameter.
A program written in C parses the JSON schema and the template code in order to produce A) the declarations of a function for each endpoint in the API and B) implementations of that function which serialize all of the parameters and call cURL with the correct URL and OAuth parameters.

So (dispensing with the contrived second-person pronouns), after several weeks on Mastodon I was musing about how I could adapt TWC to work with the Mastodon API. It didn't seem like it would be that difficult.

There were a few problems to overcome first, however:

I'd need to convert that markdown specification of the API into a JSON schema
The URLs would all need not only have a different base, but be runtime-configurable so that you could call different Mastodon instances without needing to recompile the library
Mastodon uses OAuth2 instead of OAuth1, so the authorization was potentially going to need to change.

As it turned out, 1 and 3 were easy, but 2 required enough changes that I haven't figured out how I'm approaching it yet.

For #1 I adapted my Twitterdoc script to read in from the markdown file and parse everything out into the format specified by the JSON meta-schema. This is a short term and hopefully temporary solution, because I want to approach the maintainers of the Mastodon project about building a /schema.json endpoint or something similar into Mastodon itself. This would short-circuit all my hacky text-parsing and API freshness issues and be a sustainable long-term solution.

To address 2, as a hack to get things working, I took the easy route and just redefined the base URL macro in mastodon.h to be the API uri of Cybrespace, the Mastodon server I've been running. This problem is "difficult" (annoying, mostly) due to the way I structured TWC initially: all of the API uris are concatenated together by the preprocessor as string literals. I will need to instead only concatenate the common (path) components of each uri, and fill in the domain part at runtime based on user configuration. Not a lot of work, but it remains as of yet undone.

For #3, it actually turned out to be very straightforward. In OAuth2, they dispensed with the complicated set of OAuth parameters and cryptographic signing and instead just issue a single access token that you include with each request. This made it very simple to write alternatives to twc_GenerateOAuthHeader and twc_OAuthHeaderMaxLength that simply return the user's access token with a string prefix, and conditionally use the alternatives if we're compiling TWC as a mastodon library.

And with that, it was working well enough to begin making successful API requests to Mastodon instances!

There's still much to be done, and I'll be posting updates as I continue to work on this. I also have a couple unrelated things in the pipeline that I may be posting more about in the coming weeks, so watch this space. But for now, thanks for reading!

April Update - Part 1: A Mammoth Task

Andrew Chronister — 8 years, 6 months ago

Past Me

Looking ahead, I plan to spend a solid chunk of time next month giving some TLC to some of my older projects, including TWC and 4vim. More on that as events transpire!

A wise man
The best-laid plans of mice and men / Go oft awry

So imagine you're me. You've started out a new quarter at University with an unprecedentedly light schedule -- three classes, all on Tuesdays and Thursdays. You've got all the time in the world on Mondays, Wednesdays, and Fridays, not to mention the weekends. You've several projects you're itching to put in time and code on. Things look to be going pretty well, and you make a few commits on some dusty git repos.

March is nearing a close and the world is, if not normal or ordinary, business as usual, as much as it can be in the current political climate. Twitter's a bit of a stressful affair these days, but you still hop on occasionally to catch up on current events, thoughts from other programmers, and so on. So there you are scrolling down your Twitter feed, when someone mentions an interesting new thing called Mastodon.

"What's Mastodon?" you think to yourself, and click through.

The landing page gives you the basic rundown. It's a new open source social network, touting some features that Twitter hamfistedly messed with over the last few months (chronological timeline, normal replies) and a 500 character post count. It also mentions something about being "distributed," but you've no idea what that means.

Intrigued, you create an account. The first thing you notice is the UI: it's like Tweetdeck. "Well that's pretty nice," you think, although there's this annoying gap to the side and you can't seem to create new columns.

Speaking of columns, there's the home timeline and the notifications pane, but there's a couple other options you can open on the third column that sound unfamiliar. "Local timeline" and "Federated timeline." You click on the Federated Timeline experimentally.

Posts ping by every few seconds. There's a lot of chatter about the site itself and introductions from other new people. Pictures and quotes and discussion. It seems to be the old Twitter firehose, reborn as a readable and interactive stream.

The local timeline is the same, but only for people on your instance. You're not sure what that means. It seems mostly the same as the federated timeline.

Later you learn that by "distributed," the landing page means that this web site, mastodon.social, is just one of dozens of instances all running the Mastodon software. All of them, and a fair number of other servers running OStatus compatible software, can mutually exchange posts and allow users to follow each other. It's a real network, decentralized.

You also notice the post button, rather than "Tweet", says "TOOT!". At first you can't get over this (b-b-but it means fart!) but the embarrassment quickly fades. You crack a joke or two about it like everyone else and then quickly grow used to saying "toot" everywhere you used to say "tweet".

You stay on mastodon.social a few days, meeting people (everyone there is very friendly and the community is small) and learning the ropes. It's easy to gain followers, although there's no people you know from Twitter besides a couple of tech reporters who are there to write short articles critical of its success. That's okay though, because you're enjoying talking to the people there enough that you haven't even been back to Twitter. The atmosphere is different. It's optimistic instead of apocalyptic.

A few days pass. Scrolling idly through the federated timeline, you realize mastodon.social has grown a lot in the past few days. Like a lot. The other instances haven't, as much. Things are actually starting to slow down as the server falls over due to load.

You decide to go check out the git repo and see how hard it'd be to deploy your own instance. The Production guide makes things seem pretty straightforward, although the Ruby and Node.js make your stomach turn. You decide to give it a shot on your spare VPS, a cheap $5 digitalocean droplet you keep around mostly for backups and miscellaneous experiments.

After a few hours of trouble, you finally manage to get everything set up right. Your home page looks just like the flagship instance, but you're the only user there. You can import your follows list, though, and all the posts show up in your home timeline. The federated timeline is a lot less useful, though -- it's only showing posts from people you follow anyway. It turns out this is just how server federation works -- your server only brings in the posts that are from accounts that people on your server follow.

Feeling confident, your thoughts begin turning towards a new possibility. What if you ran your own instance? It could work, there's a lot of people looking to get off of mastodon.social, and you know everything you'd need to do to get it up and running.

It's April 5th. Around noon, you start browsing for domains and by 8pm you have a running instance on the novelty domain cybre.space. You've decided to go with an amazon ec2 instance, a t2.medium, hoping that will suffice for a smallish community. You give it another hour or two while you make some customizations (colors, strings -- you changed "TOOT!" to "PING!" for flavor, and "like" to "florp" to capitalize on an old Twitter joke), and then open it up to the public. Cybrespace is alive.

Likes are now florps. Timeline goes sideways. (It doesn't, really, but I found the joke too good to pass up, and now this is the only screenshot I have from then.)

Turns out running a web server, especially a customized web server that needs to be merged back with the rapidly-developing master branch, is hard work. Administration, and fostering the small cyberpunk community growing on your instance, takes most of your time over the next couple weeks. Besides what you need to do for work, your other coding falls by the wayside.

Eventually, things slow down a bit, your server is stable, and you can finally breath a bit again. You're running your own Twitter and the people there are awesome. Things are looking alright there. You begin to feel guilty about having dropped all your work on 4vim, TWC, and HMN.

Looking at your TWC codebase, an idea forms.

[To be continued, in Part 2]

March Update

Andrew Chronister — 8 years, 7 months ago

Greetings everyone,

I unfortunately have no progress to report for March. I finished winter quarter with good grades thanks to spending most of my time the first few weeks of March doing homework or studying, but that left me with little time to work on additional projects like TWC. Over spring break, I spent my development time on prototyping an idea I've had ruminating for a while.

Looking ahead, I plan to spend a solid chunk of time next month giving some TLC to some of my older projects, including TWC and 4vim. More on that as events transpire!

-- Andrew

February Update

Andrew Chronister — 8 years, 8 months ago

Greetings everyone,

With Handmade Network v1.0 out, I've been taking a break from my software projects to focus on schoolwork. This quarter, I'm taking an Operating Systems class and a Compilers class, both of which are heavily project-based and require a great deal of my off-time.

However, TWC may benefit directly from these. In the OS class, I've been writing a lot of C, meant to be read by my project partner and the graders, which has been improving my code style and documentation quality. In the Compilers class, I've been learning about the inner workings of compilers, which will allow me to write more conformant code.

Next time I have a chunk of time to work on TWC, I will be working on the following:

Cleaning up the twitter.h header to make it more suitable for use by language bindings. E.g. removing $ from identifiers, relying less on macro-based generics, narrowing down to a smaller list of types.
Improving the metaprogramming layer for future code generation directions. In particular, adding the ability to insert code into an existing file so that e.g. the TWC_URL_XXX definitions can be created automatically.
Generalizing some parts of the system to allow for supporting streaming REST endpoints
Writing a simple command line client for basic tasks like tweeting (with media). (This is in fact mostly done, it just needs a bit of polish).
Exploring the possibility of automated testing by having the code generator create basic tests.

Happy Programming!

Tweet With C is out now!

Andrew Chronister — 8 years, 10 months ago

Last summer, when I was down in Florida for a NASA internship, I was challenged by my housemates to a summer code-off. The goal? To build the coolest piece of software in a mere 10 weeks. Having recently begun exploring the world of Twitter bots, and disappointed that there was no existing library to make Twitter API queries from C, I resolved to spend my summer building a library to do just that.

And now, 6-and-a-bit months later, in the dead of winter, I finally finished. Behold, Tweet With C!

Download

This might seem kind of nuts -- who would want to use twitter from such a low level language? Well, it's not as nuts as you might think. I envisioned two major use cases when I began this project:

1. Code that wraps this library in a higher level language (many languages have compatibility layers for working with native/C functions).

2. Projects that want to interact with twitter without using one of the potentially-heavier libraries that exists in their native language, or are written in languages with no existing twitter API library, should be able to use the above to leverage this library.

Essentially, I'm using C because it's the lingua franca of the modern software world, and no one else as far as I've seen has filled this niche yet. And, of course, there might be some enterprising individuals who wish to use the library from other C code.

Because of my choice of language, I am not afforded many built-in tools for making expressive code. Past efforts to endow C with higher-level features are well documented elsewhere, but for this project I wanted to see what I could come up with that would:

- Reduce the amount of code that had to be written

- Automate the process of keeping the library up to date with changes to the Twitter API

- Make code more readable and/or more understandable

- Use only mechanisms built into the C programming language as of C99 or provided by the most commonly used tools in the C community

- Avoid adding unnecessary dependencies to the project

The best solution I found to fit these constraints was a custom code generator. Also written in C.

So in this post, I'll take you on the magical adventure through metaprogramming that I've been riding on-and-off since late this summer. Hang on to your cosmetic headwear.

Part 0: The Tools We've Got
Before I ever started writing the custom preprocessor/code generator for this project, I built for myself a couple things that proved quite helpful while developing the library. Since they fall a bit into the 'metaprogramming' territory, I thought they would be valuable to bring up here.

Generics

It's not quite as 'fancy' and 'turing-complete' as C++'s templates, but it's actually not too difficult to fake type-generic structures in C:

    // Buffer types (essentially a block of memory with a size)
    #define twc_buffer_t(T) struct { size_t Size; T* Ptr; }
    
    typedef twc_buffer_t(void) twc_buffer;
    typedef twc_buffer_t(char) twc_strbuf;
    typedef twc_buffer_t(const char) twc_string;

For my purposes, I only really needed to use this to make a few concrete data structures which then had functions over them defined the usual way. There are a couple ways you could define type-generic functions, but they can get convoluted and ugly very quickly (or aren't type safe / very useful). Such is life in a procedural language from 1971.

Option types

I used ML for a programming languages class for a bit and one thing I quite liked about it was its notion of 'option types'. Essentially, an `T option` is just like a `T` except it can also have the value `NONE`. The fairly straightforward translation to C is a struct that holds a bool indicating whether or not the option is set, and the value. My trick for doing generic-like types above works well for this:

1	#define twc_option_t(T) struct { bool Exists; T Value; }

I adopted the convention of appending a `$` character to the end of these type
names to signify they were option types:

    typedef twc_option_t(bool) bool$;
    typedef twc_option_t(int) int$;
    typedef twc_option_t(float) float$;
    // ...

I also defined a few helper macros to facilitate creating and taking apart these types:

1
2
3

    #define TWC_NONE(T) (T){ .Exists = false }
    #define TWC_SOME(expr, T) (T){ .Exists = true, .Value = expr}
    #define TWC_OPTION(cond, expr, T) ((cond) ? TWC_SOME((expr), T) : TWC_NONE(T))

This is also one of the only parts of the project, to my knowledge, that _requires_ C99 (stdbool aside). If at some point in the future I want to try to make it all C89 compatible, I may have to either make these real functions, or multiline macros, neither of which strikes me as particularly appealing.

Alright, onto the main event.

Part 1: The Metadata
I knew pretty much from the get-go that I was going to need some metaprogramming on this project, because I wanted the library to have an interface that was slightly more advanced than a function taking strings for the URL and parameters. That is, I wanted to wrap the 100-some API calls in appropriately-typed function calls with appropriately-typed parameters and be able to keep that up to date pretty much on my own if I have to. So writing all of that by hand was out of the question.

However, it wasn't until I had the basics of the library down that I knew the specifics of what I'd have to generate. Once I'd implemented the OAuth signing and wrote out a couple function wrappers for the most common API calls -- `account/verify_credentials` and `statuses/update` -- it was pretty clear that the process was going to be incredibly formulaic.

Essentially, for any given API endpoint, I needed the following pieces of information to write the corresponding function:

- Endpoint path, e.g. `/statuses/update.json`

- List of parameters

- `GET` or `POST`?

Then, the code I needed to write for that endpoint would consist of a struct encompassing the parameters:

    typedef struct {
        // <type of param 1> <Param 1 Name>;
        // ...
        // <type of param N> <Param N Name>;
    } twc_<endpoint path>_params;

And a function taking that struct which compiles the members into a linked list of strings (on the stack, because it doesn't need to persist beyond this call):

    extern twc_call_result
    twc_<Endpoint Path>(twc_state* Twitter,
                        twc_<endpoint path>_params Params) 
    {
        twc_key_value_list ParamList = NULL;

        // <Serialization based on the type of param 1>
        ParamList = twc_KeyValueList_InsertSorted(&ParamList, <serialized param 1>);

        // ...

        // <Serialization based on the type of param n>
        ParamList = twc_KeyValueList_InsertSorted(&ParamList, <serialized param n>);

        return twc_MakeCall(Twitter, <relevant POST or GET value>, <relevant URL>, ParamList);
    }

(And of course, the corresponding function header stub to go in the header for #include purposes)

So my work was cut out for me.

Part 2: The Data
Before I could generate anything, I needed a way to reason about the twitter API in a programmatic way. Unfortunately, as I found out when I asked on their forums, there was no publicly available schema. The only resource for finding out about the API is the web docs.

Crawling HTML for bits of information is not much fun even in languages that do it well, and for a language like C where I'd have to bring in at least one additional library to parse the HTML, it seemed like it would quickly become a nightmare. So I took another tack.

I relaxed my requirements slightly and wrote the web docs crawler in python, supported by the `requests` and `beautifulsoup` libraries. My reasoning was, if I could produce a normalized, machine-readable format, it would definitely be helpful to people other than myself. I won't go on at too much length about this part, because that repository is already public on GitHub.

Once I had the docs in JSON format, I was ready to consume them in C. Now, C doesn't have built-in support for JSON (considering it predates JavaScript itself by a couple decades) but there are a number of libraries for this sitting around. Sean Barrett has made a list of single-file libraries that includes several good ones.

However, I had in fact already written a JSON parser for myself previously as part of the development of this web site that was almost entirely C compatible. It seemed fitting to use that instead of an extra dependency, even if that dependency would be pretty easy to include, because I would be able to control the API and make changes or additions as they became necessary here. So I copied my JSON parser into the project, removed the few C++ features that were being used, and refactored the data layouts a bit so that it could predict the amount of memory it would use to parse a given document.

Once that was done, I started writing out the code that would turn a parsed JSON document into a more usable set of data structures. The structs I defined ended up more or less precisely the same as the layout of the `api.json` I output from Python, with a couple of additions:

    // Twitter API endpoint descriptions
    typedef struct
    {
        twc_string Name;
        twc_string Type;
        twc_string Desc;
        twc_string$ Example;
        bool Required;

        twc_string FieldName;
    } api_parameter;

    typedef struct
    {
        twc_string Path;
        twc_string$ PathParamName;

        twc_string Desc;

        twc_http_method Method;
        bool Unique; // Whether any other endpoint has the same path but different HTTP method
        
        int ParamCount;
        api_parameter* Params;
    } api_endpoint;

Going from the parsed JSON to this is fairly trivial. I will, however, highlight a couple interesting roadblocks, corresponding to new parameters on the structs.

1. `FieldName` -- I use a certain `Capitalization.Style` in my code, and I wanted generated code to match. Also, I didn't want any fields to inadvertently shadow variables or macros from elsewhere, so I generated this string out of the twitter parameter names to use in the produced code.

2. `Unique` -- Since I'm in C, I obviously can't overload functions with multiple sets of parameters. But there are some cases where the twitter API has a GET and a POST version of the same endpoint. In these cases, I wanted my generator to append Get or Post to the name of the generated function.

3. `PathParamName` -- This was a 'fun' one. About 3/4 of the way through writing the library I noticed/remembered that some of the twitter API functions accept URL slugs, like `statuses/destroy/:id.json`. Argh. For these functions, serialization must be different, since the parameter is expected to be inserted into the very URL used to make the query. More on that later.

Alright, that's the data. So how do I get from data to code?

Part 3: The Code
My first impulse was to simply switch on the type right there in the code generator and spit some strings out into the generated document. Something along the lines of:

    switch (Param.Type) {
        case "twc_string": {
            printf("twc_key_value_pair %s_KV = twc_KeyValueStr(%s, %s);", FieldName, ParamString, ParamName);
            printf("ParamList = twc_KeyValueList_InsertSorted(ParamList, &%s_KV);", FieldName);
        } break;

        case "int": {
            printf("twc_key_value_pair %s_KV;\\n", Param.FieldName);
            printf("char Buf[13];\\n");
            printf("memset(Buf, 0, sizeof(Buf));\\n");
            printf("twc_SerializeInt(%s, Buf);\\n", Param.FieldName);
            // ...
        } break;

        // ...
    }

As you might expect, this got very unwieldy very fast. Dealing with the code I wanted to generate directly as inline string literals was error prone and hard to manipulate. I want syntax highlighting, damn it! But I had another idea.

I don't know where I originally saw this, but I'm fairly sure I copied it from someone. The idea is, have certain segments of code in a source document flagged as being "templates" (in the traditional sense of the word, and not the C++ sense) for the production of code that serializes a certain type. Then have certain sentinel values inside that "template" that are replaced by a relevant name when the final code is output. For example:

    twc_param_serialization(twc_string)
    {
        twc_key_value_pair @FieldName@_KV = twc_KeyValueStr(@ParamString@, @ParamName@);
        ParamList = twc_KeyValueList_InsertSorted(ParamList, &@FieldName@_KV);
    }

This template is read in by the code generator, which is equipped with a decent C tokenizer. It builds a list of all the templates it finds, tagged by which type they are used to serialize. Then, when I need to go through and produce the final function corresponding to an API endpoint, I just look up the template corresponding to each type in the parameter list, and paste that into the final document.

As a side note, I also have another kind of template (specified by `twc_url_serialization`) for parameters that need to end up in the URL. This is just because the local variables that need to be updated are different. One thing I might pursue is separating out the bare minimum bits necessary to serialize a given type, and then generate the stuff that goes around it to e.g. update the parameter list or fill in the URL slug, because there's a fair bit of repetition between templates.

To give an example of how this works in practice, for `statuses/update.json`, I produce a struct creatively named `twc_statuses_update_params` that houses all of the optional parameters that you could pass to the Twitter API, and the following function definition:

    extern twc_call_result
    twc_Statuses_Update(twc_state* Twitter,
                        twc_string Status,
                        twc_statuses_update_params Params);

Then, when I need to generate the code that serializes all of the parameters (including the required `Status`) for sending off to cURL, I get something that looks like the following:

    extern twc_call_result
    twc_Statuses_Update(twc_state* Twitter,
                        twc_string Status,
                        twc_statuses_update_params Params)
    {
        twc_key_value_list ParamList = NULL;

        twc_key_value_pair Status_KV = twc_KeyValueStr("status", Status);
        ParamList = twc_KeyValueList_InsertSorted(ParamList, &Status_KV);

        // ... All of the other params

        return twc_MakeCall(Twitter, 1, TWC_URL_STATUSES_UPDATE, ParamList);
    }

This means that the surface area for updating the code when Twitter updates their API is *very* small. In fact, code changes only ever need to be made for the following reasons:

1. Twitter adds a function that takes a new type of data. (This is not a pressing concern, because it will only become apparent when I update my API schema script to correctly identify this new type -- otherwise, it'll just default to 'string', which delegates serialization to the library user)

2. I find a bug with the way a certain type of parameter is serialized. In this case, I fix it in one place (the template), and every single place that type of parameter appeared, the bug will be fixed.

That's it!

My overall return on investment on this has been very good. The code generator is around 800 LLOC, all of the templates together are about ~150 LLOC, and the produced header and C file are just over 4000 LLOC. 25% is a pretty awesome compression ratio, if you ask me!

Part 4: The Conclusion
The downside to all this is that it's very difficult to test. A future direction I want to pursue is automatically generating test cases for each API endpoint, to make sure that there are no unexpected crashes or API errors returned. I'm trying to maximize the robustness and flexibility that I can give myself if I'm the only one maintaining the project. But who knows, maybe someone else will be interested and dedicated enough to help out?

For now, you can download the library from [Github](https://github.com/Chronister/twc) and try it out for yourself (Makefile provided for Linuckers, batch file for Windowheads). I'm also working on a command-line twitter client to allow for quick-n-easy tweeting (think `tweet "CLI is dead, long live CLI"`), so keep your eyes peeled for that.

Between developing stuff in C and taking way too many classes at university, I also r/w on twitter dot com as @chronaldragon so if you have any feedback or comments for me, please @ me or leave them in the comment box below.

Thanks for reading!