Introduction

As programmers interested in more than just the Microsoft world you've probably been warned about working with windows (actual program windows, not Windows) on Linux directly. Instead we're told to just use GTK or Qt, or SDL2. While that is a worthwhile suggestion if you value your sanity, it has also lead to a definite shortage of tutorials on how to do this without the help of libraries. That's a state of affairs that only makes it even harder to do this kind of programming, and write new libraries. This is why I decided to write a little programming tutorial series on this topic.

This tutorial covers basic window creation, event interaction, basic drawing to the window, and keyboard interaction.

Before we start

There are a few things we need to discuss before we can start.

This tutorial is about the Xlib library for use with X, and not the newer xcb library. Even though Xlib is supposed to be "deprecated" in favor of xcb the latter can't be used to interface with opengl and is in most regards still very similar to Xlib. As far as I can tell there is no way today to really do without Xlib. Since that is the case I don't see the point in using two very similar libraries in one project, making Xlib the superior choice in just about any situation. I also don't think there is any need to worry about the fact that Xlib is described as being deprecated. Xlib is still by far the most widely used library of the two. We will most likely see the death of the X system before we see the end of Xlib support.

With this in mind we have to get our hands on the documentation for xlib. The official documentation can be found at: http://www.x.org/wiki/ProgrammingDocumentation/

Although to be honest I mostly used this version: https://tronche.com/gui/x/xlib/, since those pages turn up at the top of Google search.

When you go through these tutorials, always have the documentation open and read it as new functions come up. I can give you an idea on what to do and where to start, but it's always better to read about the specifics in the docs.

More relevant standards documentation that will be necessary can be found here:

https://specifications.freedesktop.org/wm-spec/1.3/

https://www.x.org/releases/X11R7.6/doc/xorg-docs/specs/ICCCM/icccm.html

These are necessary to control the window manager. What the window manager is will be discussed shortly.

To build these examples not much setup is necessary, just pass the code to your compiler and link with the option -lX11. You also need the xlib development files installed, probably to be found in a package named libx11-dev or similar.

An overview

There are a couple of parts to the windowing system on Linux we care about. Be aware that the following is just a very rough overview that glosses over ~~some~~ most of the details. If you want to know more there are reasonably long Wikipedia articles on all of those things.

X: Sometimes called X.org, X-Server or just X. Those things all have slightly different meanings, but let's not worry about them. I will just refer to this as X. X as used in this tutorial is the software that stores the state of all windows. Takes device input and makes it available to applications via an event system. And facilitates rendering the windows. X is also the part of the Linux system that the GPU driver typically connects to. In general you need to make an X application to use OpenGL on Linux. X is server software, clients can be on the same or on different computers.
The display manager: This is a program that defines an X client session. It connects to an X server on the same or a different machine, almost all of the calls to X need you to supply the display as a client id.
The window manager: Whereas X mostly just stores the state of the windows and passes events to them, the window manager is a program that actually cares about what your windows do, how they look, where they first appear on the screen and in what size. The window manager also adds what Linux land calls the 'window decoration', that is the title bar and buttons. This is important to us because in some cases the API for interacting with the window manager is different from the one for interacting with X itself.
The X application. That's us! This is also where Xlib comes into play. X was originally designed for use over the network, and this still works to a degree today. But mostly what this means for us is that we should be aware that when we are using Xlib we are not calling functions in X directly. But rather Xlib is a C wrapper around the network protocol used by X. This means that anything we do happens asynchronously in another thread, another process. When we want to make sure that changes have really taken effect we need to flush the buffer of commands.

Finally some code

All code used in this article is available in full on github. The part number is indicated in the file name. Probably a good idea to have those files open for reference. https://github.com/eisbehr/xlibtut

Part 0 - Creating a Window

We'll go through the code step by step. To see it in full see the files on github.

#include<stdio.h>
#include<stdlib.h>

#include<X11/Xlib.h>
#include<X11/Xutil.h>
#include<X11/Xatom.h>

int
main(int argc, char** args)
{
}

This is probably the first small difference for people coming from Windows. There is no special main function for programs that want to create a window, just the standard C main() one. The next couple of snippets belong inside the main() function

int width = 800;
int height = 600;

Display* display = XOpenDisplay(0);

if(!display)
{
  printf("No display available\n");
  exit(1);
}

int root = DefaultRootWindow(display);
int defaultScreen = DefaultScreen(display);

First we define the width and height for later use.

Then we open the standard display (by passing 0). X is set up as a client-server model with a network protocol. This means the server needs to have an idea of what clients there are, and which windows can interact with which other windows. The X server also needs to know who it is talking to, so we have to keep the display pointer. We will pass this to most other Xlib functions. We also check that the call returned a valid display.

Next we get a root window. Windows are organized in a tree structure, so there has to be a parent to every window we create. We are making a normal "top level" window, so our parent needs to be the "desktop", this is what the root window is. If we were making an application with a setting windows we could use our own main window as the parent to the settings window.

As a last thing in this paste we get the default screen. A screen in X is generally just an area or a buffer to be rendered to, this can be through a graphics device onto a real monitor, or completely in software to an in-memory target.

int screenBitDepth = 24;
XVisualInfo visinfo = {};
if(!XMatchVisualInfo(display, defaultScreen, screenBitDepth, TrueColor, &visinfo))
{
  printf("No matching visual info\n");
  exit(1);
}

Here we define what kind of requirement we have for our render target. This used to be a lot more interesting in 1984 (This is how old X is) when there were monochrome displays and displays of different color depths. Nowadays this is pretty much always the same. With 24 bits for RGB-8-8-8 and TrueColor.

XMatchVisual takes our requirements, matches them against an internal list of visualInfos that are supported on this hardware, and then returns the best fit. Or no fit if it's not possible to meet our requirements, but that is very unlikely these days.

XSetWindowAttributes windowAttr;
windowAttr.background_pixel = 0;
windowAttr.colormap = XCreateColormap(display, root,
             visinfo.visual, AllocNone);
unsigned long attributeMask = CWBackPixel | CWColormap;

Window window = XCreateWindow(display, root,
                              0, 0,
                              width, height, 0,
                              visinfo.depth, InputOutput,
                              visinfo.visual, attributeMask, &windowAttr);

if(!window)
{
  printf("Window wasn't created properly\n");
  exit(1);
}

This is where we create the window. We set some attributes for the window in the XSetWindowAttributes structure. We set the background fill pixel to be black and create a colormap from the visualinfo we created earlier. Then we define the attributeMask, a mask that tells XCreateWindow which of the attributes we want to get used. We want to use all of those we set.

XCreateWindow is pretty self explanatory at this point if you look at the documentation for it. We just pass it all the info we have prepared so far. InputOutput means that this window will receive and handle incoming events and also output to the screen.

We then check that creating the window worked.

  XStoreName(display, window, "Hello, World!");

  XMapWindow(display, window);
  XFlush(display);

  return 0;

First let's give our window an appropriate title. The window title is a feature of the window manager. XStoreName() is a convenience function that keeps this API hidden from us for a little longer. Now we map the window onto the display. This means that the window will become visible. We then flush to make sure all of our commands have gone through to the server. If we now run this… nothing happens. When we start this in a debugger and break at the last line, we see that the window was successfully created but closes immediately. So far so good.

Let's get it to actually show a window. Add the following after the XFlush();

1
2
3

while(true)
{
}

When we run our program now we can see that the window stays open. We can now press the close button on the window and it closes!

But wait… we made the program get stuck in an endless loop, how could it escape from that loop to close? If we look at our task manager we can see that the process belonging to our happy little window is still running. This is because the window itself is not directly connected to the process of our program. Closing the window through the window manager, and on the side of the X server, does not kill our process.

Part 1 - Closing the Application

To solve this inconvenience we need some way to set the while loop to false when the window is closed. The way to do this is through events.

int windowOpen = 1;
while(windowOpen)
  {
    XEvent ev = {};
    while(XPending(display) > 0)
      {
        XNextEvent(display, &ev);
        switch(ev.type)
          {
          case DestroyNotify: {
            XDestroyWindowEvent* e = (XDestroyWindowEvent*) &ev;
            if(e->window == window)
            {
                windowOpen = 0;
            }
          } break;
        }
      }
    };

We replace the while loop from the last part with this one.

This is the basic event loop. It runs while there are pending events. When there is one, it takes an event off the event stack and switches on its event type. What we care about is the DestroyNotify event, we need to cast the general event to this specific type. Now we check if the event is targeted at our own window and set windowOpen to false if that is the case. What's interesting to note here is that each event tells us which window it is meant for. We should only receive events that belong to our own main window, but if we have opened several windows that might not be true, so we check to make sure. If you keep in mind why this check exists, you can leave it off for as long as you only have one window open.

To make event processing work there is something that has to be adjusted in code we've already written. In our windowAttrs we need to add one extra value, and we have to add some more values to the attributeMask (This code is right before we create the window with XCreateWindow).

1 2	windowAttr.event_mask = StructureNotifyMask; unsigned long attributeMask = CWBackPixel \| CWColormap \| CWEventMask;

When we set our window attributes we add another attribute, the event_mask. As you can look up in the documentation, the DestroyNotify event is part of the StructureNotify group, we have to set the StructureNotifyMask on the window to tell X that we want to receive events of that type. This kind of opt-in into events limits the amount of events we receive and thus the time we spend on the event loop working through events we don't actually care about. On our attribute mask that tells XCreateWindow() which attributes are set we add the CWEventMask.

Now we have a window that stays open, and and application that ends when the window is closed. Pretty neat, eh?

Part 2 - Minimum Size

Now that we have the nitty gritty part out of the way let's add some niceties to get more used to the way the rest of the API works. Let's start with something simple.

When we grab the edge of the window and resize it to be smaller we notice that there is no limit to how small we can make the window. Let's change that and give it a minimum size.

void
setSizeHint(Display* display, Window window,
            int minWidth, int minHeight,
            int maxWidth, int maxHeight)
{
  XSizeHints hints = {};
  if(minWidth > 0 && minHeight > 0) hints.flags |= PMinSize;
  if(maxWidth > 0 && maxHeight > 0) hints.flags |= PMaxSize;

  hints.min_width = minWidth;
  hints.min_height = minHeight;
  hints.max_width = maxWidth;
  hints.max_height = maxHeight;

  XSetWMNormalHints(display, window, &hints);
}

…
(in main, before XMapWindow())

setSizeHint(display, window, 400, 300, 0, 0);

We fill out an XSizeHints structure and set it. By setting the flags only when a nonzero size is passed, we make it so that 0 means unset. When we call our function with the given parameters we get a window that can be as big as it wants, but no smaller than 400x300;

Size hints are another window manager feature, like the window title, and this is another one of those convenience functions.

Part 3 - Maximize Window

Let's try maximizing the window. No more convenience for this one. Now we'll learn about the API used to communicate with the window manager, atoms and properties. This is where you should definitely open the window manager documentation mentioned in the beginning.

Atoms are in and of themselves not difficult to understand, it's just the silly name that makes them seem complicated. Information and properties about our window are stored inside X in so called properties. Properties are identified by unique labels, those labels are called atoms. In essence, X simply calls key-value pairs atom-property pairs, for some reason. Probably better not to ask.

Status 
toggleMaximize(Display* display, Window window) 
{  
    XClientMessageEvent ev = {};
    Atom wmState = XInternAtom(display, "_NET_WM_STATE", False);
    Atom maxH  =  XInternAtom(display, "_NET_WM_STATE_MAXIMIZED_HORZ", False);
    Atom maxV  =  XInternAtom(display, "_NET_WM_STATE_MAXIMIZED_VERT", False);

    if(wmState == None) return 0;

    ev.type = ClientMessage;
    ev.format = 32;
    ev.window = window;
    ev.message_type = wmState;
    ev.data.l[0] = 2; // _NET_WM_STATE_TOGGLE 2 according to spec; Not defined in my headers
    ev.data.l[1] = maxH;
    ev.data.l[2] = maxV;
    ev.data.l[3] = 1;

    return XSendEvent(display, DefaultRootWindow(display), False,
                      SubstructureNotifyMask,
                      (XEvent *)&ev);
}

…
(in main, after XMapWindow())

toggleMaximize(display, window);

The Atom-property pairs are sent to X through the event system. There is a special event type that flows from the client to the server, the ClientMessage.

Atoms need to be queried by name. The function also wants to know if an Atom should be created if it does not exist, we do not want that. We are using standardized Atoms that need to exist, if they don't exist we probably made a typo and want to know about that. As far as I can tell Atoms are just integer indices, but there is a special None value, so we check against that. We also only check against the _NET_WM_STATE Atom, since the presence of this atom implies the presence of the other two according to the specification. (https://specifications.freedesktop.org/wm-spec/1.3/ar01s05.html#idm140238712324208)

Now we're filling out the client message. type is ClientMessage which is a pre-#defined int.

The format is 32. This means 32bits and tells the recipient of this messages what kind of data we put into the ev.data.* array. This data array portion is a union with

union {
    char b[20];
    short s[10];
    long l[5];
} data;

The possible values for format are 8, 16 or 32, which are the corresponding bit sizes for the char, short, and long types.

Then we pass our window, and our atom message_type. The data array starts with 2 to signify a "toggle", as per specification (they define _NET_WM_STATE_TOGGLE as 2, but my headers don't seem to include this define). Following this the two atoms for vertical and horizontal maximization. The last number is a "source indication", this is a number that indicates if your program is a normal user application, something like a task bar, or following and older version of the protocol. The source indication for a normal application is 1 (spec: https://specifications.freedesktop.org/wm-spec/1.3/ar01s07.html#sourceindication)

When our event is done we send it on its merry way. Again, in accordance with the _NET_WM_STATE specification, we know that this kind of message has to be sent to the root window (which is something like the "desktop"). The event_mask to be used is not entirely clear to me, but in my tests the maximizing only takes effect if there is a mask given, and SubstructureNotifyMask, which is defined to mean "Any change in window structure wanted" makes sense in this case (and works, which is a plus).

Make sure to familiarize yourself with the documentation and specification documents at this point so that you can understand where all this information comes from and how you could find it yourself. This is the most important skill to learn for when you need to solve different problems yourself.

Part 4 - Even better shutdown

As a first order of business, let's remove the call to toggleMaximize(). It was a good way to show the window manager api, but we don't need it right now.

  XMapWindow(display, window);

  //toggleMaximize(display, window);
  XFlush(display);

Now that we've learned about atoms and ClientMessage events we can improve the way our program shuts down and closes the window. If you've been running your program from a terminal you might have noticed that you're getting an error message when closing the window. This is because the window manager closes the window for you, and not in the most graceful manner. This is why there is a special way for the window manager to tell you that the window close button was pressed without actually closing the window itself. But we have to opt into this feature to use it, and still keep the old way of doing things in case that opt-in doesn't work.

//
// after XFlush()
//
Atom WM_DELETE_WINDOW = XInternAtom(display, "WM_DELETE_WINDOW", False);
    if(!XSetWMProtocols(display, window, &WM_DELETE_WINDOW, 1))
    {
        printf("Couldn't register WM_DELETE_WINDOW property\n");
    }

//
// After DestroyNotify case
//
case ClientMessage: {
                    XClientMessageEvent* e = (XClientMessageEvent*)&ev;
                    if((Atom)e->data.l[0] == WM_DELETE_WINDOW)
                    {
                        XDestroyWindow(display, window);
                        windowOpen = 0;
                    }
                } break;

What where're doing here should be mostly familiar. We're getting an Atom, details about it are in the spec. (https://www.x.org/releases/X11R7.6/doc/xorg-docs/specs/ICCCM/icccm.html#clientmessage_events) Then we use the XSetWMProtocols function to set a property on our window to tell the window manager that we would like to receive an event about window deletion so we can do it ourselves. This is what we then do in our event loop. We receive a ClientMessage, check that it's a message about window deletion, and then destroy our window ourselves and tell our loop to quit since the window is now gone. This should get rid of any error messages.

Part 5 - The buffer

To show how to get something into the window we are going to write directly into a memory buffer, which we then draw into the window using an Xlib function.

Right after the XFlush() we add the next big chunk of code.

  int pixelBits = 32;
  int pixelBytes = pixelBits/8;
  int windowBufferSize = width*height*pixelBytes;
  char* mem  = (char*)malloc(windowBufferSize);

  XImage* xWindowBuffer = XCreateImage(display, visinfo.visual, visinfo.depth,
                                       ZPixmap, 0, mem, width, height,
                                       pixelBits, 0);
  GC defaultGC = DefaultGC(display, defaultScreen);

The first couple of lines should be self explanatory. We simply define that our image will have 32 bit sized pixels, make that into bytes since we need it later, then combine it with our width and height to compute the amount of memory we need. We then malloc this amount of memory.

The meat is in the next two functions. Since we don't just want this image buffer for ourselves, but need to give it to X later to display in the window, we need to wrap it into an X compatible image structure.

We pass XCreateImage the display, and the visual from the visinfo we created for our window. We then give it the pixel depth, also from the visinfo. The next argument is an interesting one.

We pass ZPixmap for the formathere. But why ZPixmap? According to documentation this argument can take one of XYBitmap, XYPixmap, or ZPixmap. I wouldn't fault you for thinking that what we're making here is an XYBitmap. We have a width and height after all, and it's a bitmap if I ever saw one.

To get to the bottom of why we choose a ZPixmap we need to get the difference between a bitmap and a pixmap out of the way. In X-speak a bitmap is an image that you only use locally in your program. A pixmap is a bitmap that can be sent to the X server, and therefore can be displayed on a monitor. This means we want a pixmap.

So what's left is the choice between ZPixmap and XYPixmap. Information on this is hard to find, but it is mentioned in passing in the X server specification. http://www.x.org/releases/X11R7.6/doc/xorg-server/Xserver-spec.html

Pixmap images are transferred to the server in one of two ways: XYPixmap or ZPixmap. XYPixmaps are a series of bitmaps, one for each bit plane of the image, using the bitmap padding rules from the connection setup. ZPixmaps are a series of bits, nibbles, bytes or words, one for each pixel, using the format rules (padding and so on) for the appropriate depth.

So XYPixmaps are for passing a number of bitplanes. Bitplanes are another one of those things that were pretty cool in 1984 when X first came out, but are basically not used anymore today. To not blow the scope of this article right open, let's just agree to disregard them. When we read further we see that the roundabout explanation of the ZPixmap fits our use case better. In our case it's a series of 4-byte words for each pixel.

With this big one out of the way the next three argument are a bit of a breather. We simply pass the memory we allocated and the width and height of our window and image.

The next argument is called bitmap_pad in the documentation. I want you to read how it is described.

Specifies the quantum of a scanline (8, 16, or 32). In other words, the start of one scanline is separated in client memory from the start of the next scanline by an integer multiple of this many bits.

This is a very roundabout way of describing the pixel size in bits. You'll see many annoying descriptions like this when you read the X documentation.

The description also mentions scanlines. For our purposes a scanline is simply a row, or horizontal line, in our bitmap. It has the size of width * pixelBytes.

The last argument is bytes_per_line. This argument is not actually about the bytes per line at all. It is the offset from the start of one line to the start of the next line. So if your image is 3 pixels wide, with 4-byte pixels and a 2 byte space between each line, this argument would be set to 3*4+2. But if the lines of our image are contiguous, as they are in our case, we can simply pass a 0 here and XCreateImage() will calculate the value itself.

As a last thing that we create in the paste is a graphics context. This is simply some information for X that is specific to the actual hardware we draw to. We need this any time we want to draw something with Xlib that ends up on the actual screen.

Next we add a little something after the even loop to draw a bit of a pattern to our image memory. We want to see if it worked after all. This code simply draws a black and white grid over the complete image.

//
// Event loop here, add this code after it.
//

int pitch = width*pixelBytes;
for(int y=0; y<height; y++)
{
  char* row = mem+(y*pitch);
  for(int x=0; x<width; x++)
    {
      unsigned int* p = (unsigned int*) (row+(x*pixelBytes));
      if(x%16 && y%16)
        {
            *p = 0xffffffff;
        }
      else
        {
            *p = 0;
        }
    }
}

As a last step in this part we need to draw the image in our own memory to the window. Add the next one right after our grid drawing code.

1
2
3

XPutImage(display, window,
          defaultGC, xWindowBuffer, 0, 0, 0, 0,
          width, height);

XPutImage() takes the display, window, and the defaultGC we created earlier. We draw to the window, so we need the GC. We then give it the XImage we created.

The next four values are offsets in the source image, as well as the destination image. We would use this if we only wanted to update part of the window from part of our image. But since we're keeping this simple it's just four zeros. Then follow the width and height again. Well, this one was refreshingly easy!

If you compile and run this you'll see a black grid on a white background.

Part 6 - Adapting to changing window size

One thing you'll notice when you play around with the window is that changing the window size does… not much. All the new space we get from a bigger window is simply black! This is not cool. It's black because we set the background pixel to be black when we create the window.

To change the size of our xWindowBuffer image when the window size changes we need to catch another event, one that gets sent to us on window size change. And then destroy the image and recreate it with the new width and height. If you're having trouble following how all these snippets fit together, don't forget that you can look at the full source code on github. https://github.com/eisbehr/xlibtut

//
// Before windowOpen = 1; and the main loop
//

int sizeChange = 0;

//
// In the switch(ev.type)
//

case ConfigureNotify: {
     XConfigureEvent* e = (XConfigureEvent*) &ev;
     width = e->width;
     height = e->height;
     sizeChange = 1;
} break;

//
// After the event loop, before the grid drawing
//

if(sizeChange)
  {
    sizeChange = 0;
    XDestroyImage(xWindowBuffer); // Free's the memory we malloced;
    windowBufferSize = width*height*pixelBytes;
    mem  = (char*)malloc(windowBufferSize);

    xWindowBuffer = XCreateImage(display, visinfo.visual, visinfo.depth,
                              ZPixmap, 0, mem, width, height,
                               pixelBits, 0);
   }

First we declare sizeChange and set it to 0. Then we add another case to handle ConfigureNotify events. These contain the current width and height of the window. We set sizeChange to one here to signify that width and height changed. Since we may get several configure events in one frame (one run of our main program loop), we defer the actual image destruction and recreation until a wee bit later after the event loop.

In the if(sizeChange) we destroy the image, allocate memory again using the new width and height, then call XCreateImage the same way we did it originally. XDestroyImage() frees the memory we malloced, if we would like to avoid using malloc and do more hands-on memory management we would have to fill out the XImage structure ourselves.

With all this in we can now resize our window and the grid expands to fill the window! But maybe you already spotted a problem with that we're doing now.

Part 7 - About that flicker…

The way we do things now there is a flicker when changing the window size. This flicker is more noticeable when your CPU is slower since we're running our main loop at full speed.

What happens is that as soon as the size of our window changes, the whole window goes to black (The window background pixel color). Even if we run in a debugger and break as soon as we hit the ConfigureNotify event the window is already black. Here the disconnected nature of the X system is a bit of an inconvenience. The execution of the X internal window logic and our code are not coupled.

Special thanks goes to Miguel ("debiatan" of the handmade.network community) for the solution to this flicker problem. It's not a complete solution, but enough for the scope of this tutorial.

The culprit is the gravity attribute on the window. By default this is set to the value ForgetGravity, which tells X to discard the window content on resize. We don't want this to happen, so we have to set it to StaticGravity. This stops the flicker.

This means our window attributes that we set before creating the window now look like this:

  XSetWindowAttributes windowAttr;
  windowAttr.bit_gravity = StaticGravity;
  windowAttr.background_pixel = 0;
  windowAttr.colormap = XCreateColormap(display, root, 
                     visinfo.visual, AllocNone);
  windowAttr.event_mask = StructureNotifyMask; 
  unsigned long attributeMask = CWBitGravity | CWBackPixel | CWColormap | CWEventMask;

We simply need to set windowAttr.bit_gravity to StaticGravity and add CWBitGravity to the attributeMask.

Part 8 - KeyPress and KeyRelease

Now let's take a look at how he handle keyboard input. We take this one from the source of the events outwards.

As we did before with the ConfigureNotify event we need to set a flag on the window to tell X that we want to receive that type of event.

1	windowAttr.event_mask = StructureNotifyMask \| KeyPressMask \| KeyReleaseMask;

We want to know when a key if pressed, and when it is being released again.

        case KeyPress: {
          XKeyPressedEvent* e = (XKeyPressedEvent*) &ev;

          if(e->keycode == XKeysymToKeycode(display, XK_Left)) printf("left arrow pressed\n");
          if(e->keycode == XKeysymToKeycode(display, XK_Right)) printf("right arrow pressed\n");
          if(e->keycode == XKeysymToKeycode(display, XK_Up)) printf("up arrow pressed\n");
          if(e->keycode == XKeysymToKeycode(display, XK_Down)) printf("down arrow pressed\n");
        } break;
        case KeyRelease: {
          XKeyPressedEvent* e = (XKeyPressedEvent*) &ev;

          if(e->keycode == XKeysymToKeycode(display, XK_Left)) printf("left arrow released\n");
          if(e->keycode == XKeysymToKeycode(display, XK_Right)) printf("right arrow released\n");
          if(e->keycode == XKeysymToKeycode(display, XK_Up)) printf("up arrow released\n");
          if(e->keycode == XKeysymToKeycode(display, XK_Down)) printf("down arrow released\n");
        } break;

We need to add cases in the event loop to handle these two types of events. This kind of key handling is very simple. We cast the incoming event like we are used to and check if it belongs to one of the keys we want to check for. In Xlib there are two different kind of concepts involved here, one is the keycode and the other is the Keysym or key symbol. A keycode is a number that is assigned to actual keys on your keyboard. Your left arrow key has a different keycode from your right arrow key, and both of those are different from the A key. But what if we don't just want to know which physical key was pressed, but rather what symbol the user pressed? This is what keysyms are for. On different keyboard layouts one and the same symbol, like A, can be on different keys on the keyboard. To query a keycode from a symbol we use XKeysymToKeycode(). We pass it our display so it knows what keyboard layout to work with and then we pass it one of the defined keysym macros. You can find the full list of defined keysyms in /usr/include/X11/keysymdef.h.

Makes sure to run this and the following example from the terminal so you can see the printf output.

This is basic keyboard handling in the base X API. For what we do next, we have to go deeper.

Part 9 - UTF-8 Characters from XInput (No, not the game controller one)

This is where things get slightly more annoying to deal with. UTF-8 is the default character encoding these days, and rightly so. But X is old, and so is the basic keyboard API. Back then UTF-8 didn't even exist, so at some point an extension to the basic keyboard features of X was defined, this time with support for UTF-8, but also more of a bother to set up. This extension is called X Input, and even though it is called an extension, these days it can be assumed to be present on any Linux system. Most of these function are documented in the Xlib documentation, Xutf8LookupString is from a different standards set and has a manpage.

  // After setSizeHint(display, window, 400, 300, 0, 0); 

  XIM xInputMethod = XOpenIM(display, 0, 0, 0);
  if(!xInputMethod)
    {
      printf("Input Method could not be opened\n");
    }

  XIMStyles* styles = 0;
  if(XGetIMValues(xInputMethod, XNQueryInputStyle, &styles, NULL) || !styles)
    {
      printf("Input Styles could not be retrieved\n");
    }

  XIMStyle bestMatchStyle = 0;
  for(int i=0; i<styles->count_styles; i++)
    {
      XIMStyle thisStyle = styles->supported_styles[i];
      if (thisStyle == (XIMPreeditNothing | XIMStatusNothing))
    {
      bestMatchStyle = thisStyle;
      break;
    }
    }
  XFree(styles);

  if(!bestMatchStyle)
    {
      printf("No matching input style could be determined\n");
    }

  XIC xInputContext = XCreateIC(xInputMethod, XNInputStyle, bestMatchStyle,
                XNClientWindow, window,
                XNFocusWindow, window,
                NULL);
  if(!xInputContext)
    {
      printf("Input Context could not be created\n");
    }

A lot of code for very little gain. This gives us an X Input setup that is totally vanilla and does what the base keyboard API does without any setup.

First we open an X Input Method with XOpenIM(), it takes the display and then three values that have something to do with a resource database… we can ignore these for defaults. With this input method we can call XGetIMValues(), like many functions in the X Input API this is a variable arguments function. We state what kind of value we want to get by using the macro XNQueryInputStyle, then supply a pointer for it to fill and end the varargs with NULL. Now we have a list of input styles. I'm not entirely sure what an input style is, but since we need it later we have to get one.

Now that we have a whole list of input styles let's grab one to use. We iterate through all styles we got back, if the style has the flags XIMPreeditNothing and XIMStatusNothing we take it and break out. The documentation is also quite vague on this one, all I know is that those two define and input style that is not special in any way.

With the input method and input style queried we can create an input context, this is the part we actually need. XCreateIC() is another variable argument function. It takes the input method, we supply our bestMatchStyle and tell it what our window is and that it should focus on it (send events to it when in focus). The varargs are finalized with NULL.

Now on to the reason why we need the input context.

          // In case KeyPress:
          // After XKeyPressedEvent* e = (XKeyPressedEvent*) &ev;

          int symbol = 0;
          Status status = 0;
          Xutf8LookupString(xInputContext, e, (char*)&symbol,
                4, 0, &status);

          if(status == XBufferOverflow)
        {
          // Should not happen since there are no utf-8 characters larger than 24bits
          // But something to be aware of when used to directly write to a string buffer
          printf("Buffer overflow when trying to create keyboard symbol map\n");
        }
          else if(status == XLookupChars)
        {
          printf("%s\n", (char*)&symbol);
        }

The function Xutf8LookupString() takes an input context, an XKeyPressedEvent, a buffer for the returned UTF-8 character and the buffer size, it can return the corresponding KeySym in a pointer, and as a last argument a Status pointer that gets filled.

What happens next is really simple again. We check the status, and if it is a character we print it out.

Conclusion

This concludes our little tour through Xlib and related technologies. At this point you should be able to hit the ground running with a good set of basics and find your way through some of the ~~even more~~ obscure features and settings of this windowing ecosystem. As you can see, there are a lot of different layers to this system. Old functions with a legacy, some superseded by extensions, some features only available through interaction with a window manager (Which might not be there at all). That's really why people are reluctant to interact with this technology themselves. It's not unified and documentation is spread over many documents (and generally hard to find), but I hope you can see that it is doable with a little time investment!

Tutorial/A tour through Xlib and related technologies