The 2024 Wheel Reinvention Jam just concluded. See the results.

Software rendering on Linux

I would like to implement a raymarcher which is interactive and able to render in real-time. To make this work without artifacts I assume I need to utilize double-buffering.

In another project I do "standard" 3d graphics with Xlib for creating a window and GLX for OpenGL.
At first I was thinking that using only Xlib would suffice for doing real-time graphics on the CPU but searching around the web and reading the documentation I get the impression that it's not really supported. Apparently Xlib on its own doesn't even do double-buffering.

I'm a bit at a loss on what system API I should be using in order to do what I want. Basically I just need some piece of memory that I can write to that gets displayed on a window.
I've been contemplating going for OpenGL instead and upload a texture every frame or something like that. I know you can implement a raymarcher in a fragment shader like they do on Shadertoy but in my case would like to be able to manipulate the contents of the scene through a UI, like adding primitives and moving them about, and more advanced things if I get that far.

Primarily I'm wondering how other's would approach software rendering on Linux and secondly, if in my case I'd be better off just trying to make it work on the GPU instead.

Edited by Andreas on Reason: Initial post
If you want to do just Xlib software rendering without OpenGL, then keep image in your memory. Just plain bytes. Update them as you wish and then push image to Xlib for display (XCreateImage) and then do the image drawing to window (XPutImage). Performance won't be great, but that's the best you can do. No need to double buffer, because drawing image will be the process that transfers your "background" image from memory to window for displaying it.

You can use shared-memory extension to share memory between you and X server. It'll probably improve performance, but not sure how much. http://www.xfree86.org/current/mit-shm.html

Edited by Mārtiņš Možeiko on
Some of the code here might help you get started. He has some pretty clean code that is easy to read.
https://github.com/vurtun/nuklear/blob/master/demo/x11/nuklear_xlib.h
mmozeiko
If you want to do just Xlib software rendering without OpenGL, then keep image in your memory. Just plain bytes. Update them as you wish and then push image to Xlib for display (XCreateImage) and then do the image drawing to window (XPutImage). Performance won't be great, but that's the best you can do. No need to double buffer, because drawing image will be the process that transfers your "background" image from memory to window for displaying it.


Thanks for the layout! Pity about the performance but I think I'll use this as a starting point. No reason to complicate things at the get-go.

joe513
Some of the code here might help you get started. He has some pretty clean code that is easy to read.
https://github.com/vurtun/nuklear/blob/master/demo/x11/nuklear_xlib.h


Thanks! That looks like a good reference.

Edited by Andreas on
I've recently created a small software rendering engine on Linux here: https://github.com/baines/halcyon
It uses a combination of XShmPutImage and the XPresent extension for vsync.

Feel free to look over the code / copy it / ask me any questions.

Most of the Xlib / Linux specific video code is in https://github.com/baines/halcyon/blob/master/lib/linux/video_x11.c

Edited by Alex Baines on
insofaras
I've recently created a small software rendering engine on Linux here: https://github.com/baines/halcyon
It uses a combination of XShmPutImage and the XPresent extension for vsync.


Nice! Looks like just the kind of thing I'm looking for.
@insofaras: Been replicating most of the code in video_x11_init() and here's what I've got so far.

It looks like the setup runs without fault but XshmPutImage() doesn't seem to do anything. My window gets filled with a random color even though I've set my shared memory to other values.

Probably missing something basic but I'm failing to see it right now.

Setup.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
	XShmSegmentInfo segmentInfo;
	XImage* image;
	Pixmap pixelMap;
	Window window;
	XVisualInfo visualInfo;
	Display* display;
	
	Atom atomClose;
	
	void* buffer;
	
	// Init
	display = XOpenDisplay(NO_POINTER);
	
	ASSERT(display);
	ASSERT(XShmQueryExtension(display));
	ASSERT(XMatchVisualInfo(display, DefaultScreen(display), 24, TrueColor, &visualInfo));
	
	u32 width  = 1024;
	u32 height = 768;
	
	image =
		XShmCreateImage(
		display,
		visualInfo.visual,
		visualInfo.depth,
		ZPixmap,
		NO_POINTER,
		&segmentInfo,
		width, height);
	
	// Allocate image memory
	segmentInfo.shmid =
		shmget(
		IPC_PRIVATE,
		image->bytes_per_line * image->height,
		IPC_CREAT | 0777);
	
	segmentInfo.readOnly = false;
	
	// Attach memory to this process
	segmentInfo.shmaddr = (char*)(shmat(segmentInfo.shmid, 0, 0));
	image->data = segmentInfo.shmaddr;
	buffer = image->data;
	
	ASSERT(XShmAttach(display, &segmentInfo));
	
	// Mark for removal after process end
	shmctl(segmentInfo.shmid, IPC_RMID, NO_POINTER);
	
	// Window
	XSetWindowAttributes attribs;
	
	attribs.background_pixmap = BlackPixel(display, visualInfo.screen),
	
	attribs.colormap =
		XCreateColormap(
		display,
		RootWindow(display, visualInfo.screen),
		visualInfo.visual,
		AllocNone);
	
	attribs.event_mask = StructureNotifyMask | KeyPressMask | KeyReleaseMask,
	
	window =
		XCreateWindow(
		display, RootWindow(display, visualInfo.screen),
		0, 0, width, height,
		0, visualInfo.depth, InputOutput, visualInfo.visual,
		CWBackPixel | CWEventMask, &attribs);
	
	XStoreName(display, window, "RayMarcher");
	
	XMapRaised(display, window);
	
	atomClose = XInternAtom(display, "WM_DELETE_WINDOW", 0);
	XSetWMProtocols(display, window, &atomClose, 1);
	
	pixelMap = XCreatePixmap(display, window, image->width, image->height, visualInfo.depth);
	


Drawing. Called 60 times per second. Also tried calling it once.
1
2
3
4
5
			XShmPutImage(
				display, pixelMap, DefaultGC(display, visualInfo.screen),
				image, 0, 0, 0, 0, image->width, image->height, true);
			
			XFlush(display);
Your code will push pixels into the X Pixmap at 60fps but there's no association between the pixmap and the Window.

In my code I'm using the XPresent stuff to copy the pixmap to the window at vblank.

If you want a simpler method you could probably remove the pixmap entirely and switch to calling XShmPutImage(dpy, window, ...) which should push pixels directly to the window (if I'm not forgetting some reason you can't do that)

Alternatively you can set the window's background pixmap to the actual pixmap in the XSetWindowAttributes. I think XClearWindow might be needed to get it to redraw with this approach.
insofaras
Your code will push pixels into the X Pixmap at 60fps but there's no association between the pixmap and the Window.

In my code I'm using the XPresent stuff to copy the pixmap to the window at vblank.

If you want a simpler method you could probably remove the pixmap entirely and switch to calling XShmPutImage(dpy, window, ...) which should push pixels directly to the window (if I'm not forgetting some reason you can't do that)

Alternatively you can set the window's background pixmap to the actual pixmap in the XSetWindowAttributes. I think XClearWindow might be needed to get it to redraw with this approach.


Ah, didn't know you could pass a window as a "Drawable" as well. Now it works! I'll look into XPresent once I got something animating where I can spot the difference.

Pixmaps and their purpose still eludes me. I guess they're more like some mapping structure and not an image in it's own right.
It appears that I have a bug in my multi-threaded tile rendering and it's not clear to me what's going on and since I'm completely new to this I assume that I'm missing something basic.

In this gif, every tile gets filled with yellow, then proceed with the actual rendering and finally draws an outline in teal. As you can see, it appears that the top row tiles gets interrupted some way through while the rest of the tiles render correctly.

This is how my worker threads are executed. I suspect that the root of the problem might be in the way I'm resuming the threads by just incrementing the semaphore by the amount of threads at disposal. I'm not entirely sure of how they work. It seems that the OS make thread relations depending on which thread incremented or decremented it?
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
void RenderTiles(renderQueue* q)
{
	PRINT_STRING("Begin work");
	
	q->numTilesDone = 0; //volatile
	q->nextTile = 0; // volatile
	
	loop (i, 0, q->numThreads, 1)
	{
		Increment(&q->semaphore); // Calls sem_post(&s->handle);
	}
	
	PRINT_STRING("Wait for work");
	
	while (q->numTilesDone < q->numTilesTotal) // Both volatile
	{
	}
	
	PRINT_STRING("Work done");
}


This is the worker thread.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void* WorkerThread(void* params)
{
	CAST(threadParameters, parameters, params);
	
	PRINT_STRING("Thread %u started", parameters->thread->id);
	
	CAST(renderQueue, queue, parameters->data);
	
	while (true)
	{
		if (queue->nextTile < queue->numTilesTotal)
		{
			u32 indexTile = SyncedGetAndAdd(&queue->nextTile, 1);
			
			PRINT_STRING("Thread %u doing tile %u", parameters->thread->id, indexTile);
			
			renderTileWork* tile = queue->workQueue + indexTile;

			RenderTile(
				queue->image,
				queue->view,
				tile->xMin, tile->xMax,
				tile->yMin, tile->yMax);

			SyncedGetAndAdd(&queue->numTilesDone, 1); // Calls __sync_fetch_and_add
		}
		else
		{
			PRINT_STRING("Thread %u pausing", parameters->thread->id);
			Decrement(&queue->semaphore); // Calls sem_wait(&s->handle);
		}
	}
}


And finally the tile rendering.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
void RenderTile(
    renderImage* image,
    renderView* view,
    u32 xMin, u32 xMax,
    u32 yMin, u32 yMax)
{
	DrawRectangle(image, xMin, yMin, xMax, yMax, V4(1.0f, 1.0f, 0.0f, 0.5f));

	loop (y, yMin, yMax, 1)
	{
		loop (x, xMin, xMax, 1)
		{
			RenderPixel(image, view, x, y, 0.0f);
		}
	}

	DrawLinedRectangle(image, xMin, yMin, xMax, yMax, V4(0.0f, 1.0f, 1.0f, 0.5f));
}


Here's a printout snippet. Notice that the first batch of tiles(numbering the same same as thread count) are not executed in order but do so for the rest of the frame.
Also lowering the amount of threads produces less incomplete tiles as well.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Begin work
Thread 0 doing tile 0
Thread 4 doing tile 3
Thread 1 doing tile 1
Thread 7 doing tile 5
Wait for work
Thread 6 doing tile 6
Thread 3 doing tile 2
Thread 2 doing tile 4
Thread 5 started
Thread 5 doing tile 7
Thread 5 doing tile 8
Thread 1 doing tile 9
Thread 0 doing tile 10
Thread 2 doing tile 11
Thread 3 doing tile 12
Thread 4 doing tile 13
Thread 0 doing tile 14
Thread 6 doing tile 15
Thread 5 doing tile 16
Thread 7 doing tile 17
Thread 4 doing tile 18
Thread 1 doing tile 19
Thread 2 doing tile 20
...

Edited by Andreas on
I'm not sure this is the source of the issue, but I think this could lead to some extra work / synchronization or even an out of bounds array access:

1
2
3
4
5
while (true)
	{
		if (queue->nextTile < queue->numTilesTotal)
		{
			u32 indexTile = SyncedGetAndAdd(&queue->nextTile, 1); <== no guarantee queue->nextTile will have the same value as above


I would rephrase it like this:

1
2
3
4
5
6
while (true)
	{
		u32 indexTile = SyncedGetAndAdd(&queue->nextTile, 1);
		if (indexTile < queue->numTilesTotal)
		{
			// do work
@marcc: That's makes sense. I did have some random crashes before so perhaps that was it. Unfortunately it didn't fix the current issue.

Just noticed a thing. Currently the rendering only gets triggered when the camera is moving. While moving I get the artifacts but when it stops the last frame appears as it should. Printout shows the same order of execution.

Could this be an X11 issue? I'm updating the frame like this. The image buffer is created as shown in this post.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
while (run)
{
    if () // Time to update
    {
        if (CameraControls(viewP, inputP))
        {
            RenderTiles(queueP);
				
            XShmPutImage(display, windowP->window, context, imageX, 0, 0, 0, 0, imageX->width, imageX->height, true);
        }
    }
}
This could be an issue if you're updating the image memory at the same time X11 reads it.

To be sure, you could copy the RenderTiles result to an intermediate buffer and then send that one to X11.
marcc
This could be an issue if you're updating the image memory at the same time X11 reads it.

To be sure, you could copy the RenderTiles result to an intermediate buffer and then send that one to X11.


That fixed the issue! Upon further inspection it appears that I've got bug in my frame timing. Will need to investigate.