FPL - A C99 Single-Header-File Platform Abstraction Library

Finalspace

#14504

March 15, 2018

mmozeiko
Both.

Okay thats weird, so it may be some stupid thing on my machine (Ubuntu 17 x64 with nvidia graphics card) only?

*Sigh* i will test your code later and see if that crashes as well - which i dont expect it will.
Oh and i tested the pastebin code as well and suprisingly this wont crash at all.

The combination of X11/GLX functions i called in FPL seems to leak memory somehow on my dev machine.
But you are right regarding memory leaks - it would crash in win32 as well - or produce undefined behavior.

Finalspace

#14512

March 15, 2018

Tested your snipped, works fine. Displays a colored triangle and does not crash when i exit it.

I run valgrind and i get definitly leaking errors.

The call in line 9466 in final_platform_layer.hpp allocates memory indirectly:

GLXFBConfig* fbAllConfigs = glxApi.glXChooseFBConfig(display, screen, attributes, &fbAllConfigCount);

But somehow this memory is not freed, even though i call XFree(fbAllConfigs) on them.

What is wrong with that?

fpl_internal XVisualInfo *GLXGetVisualInfo(const subplatform_x11::X11AppState &x11AppState, const X11VideoOpenGLState &glState, Display *display, int screen) {
			XVisualInfo *result = nullptr;

			const GLXApi &glxApi = glState.glxApi;
			const subplatform_x11::X11Api &x11Api = x11AppState.api;

			GLint attributes[] = {
				GLX_RENDER_TYPE, GLX_RGBA_BIT,
				GLX_DRAWABLE_TYPE, GLX_WINDOW_BIT,
				GLX_X_VISUAL_TYPE, GLX_TRUE_COLOR,
				GLX_RED_SIZE, 8,
				GLX_GREEN_SIZE, 8,
				GLX_BLUE_SIZE, 8,
				GLX_ALPHA_SIZE, 8,
				GLX_STENCIL_SIZE, 8,
				GLX_DEPTH_SIZE, 24,
				GLX_DOUBLEBUFFER, True,
				0
			};

			// Find all frame buffer configs
			FPL_LOG("GLX", "Get FrameBuffer Configs for Display '%p' and Screen '%d':", display, screen);
			int fbAllConfigCount = 0;
			GLXFBConfig* fbAllConfigs = glxApi.glXChooseFBConfig(display, screen, attributes, &fbAllConfigCount);

			if(!fbAllConfigs || !fbAllConfigCount) {
				FPL_LOG("GLX", "Failed getting FrameBuffer Configs from Display '%p' and Screen '%d'!", display, screen);
				common::PushError("No FrameBuffer Configs returned");
				return nullptr;
			}
			FPL_LOG("GLX", "Successfully got '%d' FrameBuffer Configs from Display '%p' and Screen '%d'", fbAllConfigCount, display, screen);

			for(int fbConfigIndex = 0; fbConfigIndex < fbAllConfigCount; ++fbConfigIndex) {
				const GLXFBConfig testFBConfig = fbAllConfigs[fbConfigIndex];

				// Get visual info from frame buffer config
				XVisualInfo *visualInfo = (XVisualInfo*)glxApi.glXGetVisualFromFBConfig(display, testFBConfig);
				if(!visualInfo) {
					continue;
				}

				// Get alpha bits
				int testAlphaBits = GLXGetFrameBufferConfigAttrib(glxApi, display, testFBConfig, GLX_ALPHA_SIZE);

				// Found a suitable frame buffer config with visual
				if(testAlphaBits > 0) {
                    int doubleBuffer = GLXGetFrameBufferConfigAttrib(glxApi, display, testFBConfig, GLX_DOUBLEBUFFER);
                    int redBits = GLXGetFrameBufferConfigAttrib(glxApi, display, testFBConfig, GLX_RED_SIZE);
                    int greenBits = GLXGetFrameBufferConfigAttrib(glxApi, display, testFBConfig, GLX_GREEN_SIZE);
                    int blueBits = GLXGetFrameBufferConfigAttrib(glxApi, display, testFBConfig, GLX_BLUE_SIZE);
                    int alphaBits = GLXGetFrameBufferConfigAttrib(glxApi, display, testFBConfig, GLX_ALPHA_SIZE);
                    int depthBits = GLXGetFrameBufferConfigAttrib(glxApi, display, testFBConfig, GLX_DEPTH_SIZE);
                    int colorDepth = redBits + greenBits + blueBits + alphaBits;
					result = visualInfo;
                    FPL_LOG("GLX", "Successfully found a suitable Frame Buffer Config (Doublebuffer: %s, Color bits: %d, Depth bits: %d) with Visual '%p'", (doubleBuffer == True ? "Yes" : "No"), colorDepth, depthBits, visualInfo);
					break;
				} else {
                    // Release unused visual info (Important!)
                    x11Api.XFree(visualInfo);
                }
			}
			if (result == nullptr) {
				FPL_LOG("GLX", "Failed finding a suitable Frame Buffer Config for Display '%p'!", display);
				common::PushError("No FrameBuffer Configs returned");
            }

			// Release all frame buffer configs (Important!)
			x11Api.XFree(fbAllConfigs);

			return(result);
		}

Edited by Finalspace on March 15, 2018, 3:02pm

Mārtiņš Možeiko

#14519

March 15, 2018

Sometimes it is false positive. Maybe library initializes some global singleton that allocates memory just once, first time. And after that keeps it alive while process is running.

Finalspace

#14537

March 16, 2018

mmozeiko
Sometimes it is false positive. Maybe library initializes some global singleton that allocates memory just once, first time. And after that keeps it alive while process is running.

I saved the XVisualInfo in a structure which is stored globally, so that i can use it for window creation and for context creation. Window creation and context creation are two separate things. But i properly release it later, so i dont understand why this happens.

This takes a lot more thinking than i thought it would be :-(

Mārtiņš Možeiko

#14538

March 16, 2018

This can happen because of something like this:

static GlobalCache* cache;

GLXFBConfig* glXChooseFBConfig(...)
{
    if (cache == NULL)
    {
       cache = malloc(...);
       SetupCache(cache)
    }

    // .. do actual choosing
    GLXFBConfig* result = malloc(...);
    // TODO: fill it up
    return result;
}

void XFree(void* ptr)
{
    free(ptr);
}

As you can see the returned pointer is freed, but global cache is not. Because it is allocated only once. Its a "singleton pattern". But valgrind will report this as a leak. This is a very common false positive in system libraries - both on Linux, and Windows.

Edited by Mārtiņš Možeiko on March 16, 2018, 7:35am

Finalspace

#14545

March 16, 2018

mmozeiko
This can happen because of something like this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
static GlobalCache* cache;

GLXFBConfig* glXChooseFBConfig(...)
{
    if (cache == NULL)
    {
       cache = malloc(...);
       SetupCache(cache)
    }

    // .. do actual choosing
    GLXFBConfig* result = malloc(...);
    // TODO: fill it up
    return result;
}

void XFree(void* ptr)
{
    free(ptr);
}
As you can see the returned pointer is freed, but global cache is not. Because it is allocated only once. Its a "singleton pattern". But valgrind will report this as a leak. This is a very common false positive in system libraries - both on Linux, and Windows.

FPL does not use a singleton pattern, but uses 2 global variables - one for the event queue and one for the application state. The audio and video system is allocated separately, but the pointer are stored in one of the global variable structure.

And the very end all the global variables and memory are freed properly in reverse order (ReleasePlatformResources). Even the pointers are set to zero as well.

Maybe the way FPL allocates memory on POSIX is wrong???

		fpl_platform_api void *MemoryAllocate(const size_t size) {
			// @NOTE(final): MAP_ANONYMOUS ensures that the memory is cleared to zero.

			// Allocate empty memory to hold the size + some arbitary padding + the actual data
			size_t newSize = sizeof(size_t) + sizeof(uintptr_t) + size;
			void *basePtr = ::mmap(nullptr, newSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

			// Write the size at the beginning
			*(size_t *)basePtr = newSize;

			// The resulting address starts after the arbitary padding
			void *result = (uint8_t *)basePtr + sizeof(size_t) + sizeof(uintptr_t);
			return(result);
		}

		fpl_platform_api void MemoryFree(void *ptr) {
			// Free the base pointer which is stored to the left at the start of the size_t
			void *basePtr = (void *)((uint8_t *)ptr - (sizeof(uintptr_t) + sizeof(size_t)));
			size_t storedSize = *(size_t *)basePtr;
			::munmap(basePtr, storedSize);
		}

Edited by Finalspace on March 16, 2018, 1:08pm

Finalspace

#14551

March 16, 2018

I updated the branch, moved the event queue to the app state and removed the cached visual info.
But i still need to cache the "Visual *" pointer and save the GLXFBConfig handle -> The same way GLFW is doing it.

Unfortunatly result is the same: Segfault on exit :-(

valgrind points again to glXChooseFBConfig...

Edited by Finalspace on March 16, 2018, 4:44pm

Mārtiņš Možeiko

#14552

March 16, 2018

Finalspace
FPL does not use a singleton pattern,...

You misunderstood me. I'm not talking about FPL code here. I'm talking about X11 implementation (libX11.so or whatever). If they use singleton, then that will show as false positive in valgrind.

1	void *basePtr = ::mmap(nullptr, newSize, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0);

As an unrelated topic - valgrind and similar tools won't show up this as memory leak in case you forgot to munmap it. They check the heap allocations for leaks, not mmap/sbrk/whatever syscalls.

Btw, I've looked more into your memory allocation functions and there are few issues/improvements:

1) first of all, your MemoryStackAllocate function is wrong. You cannot return memory allocated on stack from function. When function returns, memory is freed. It's pointless to return it. The best you can do is make a macro if you really want it.

2) MemoryAlignedAllocate is kind of pointless if it uses system allocator. Because system allocators return page aligned allocations. So they all will be at least 4k aligned.

Edited by Mārtiņš Možeiko on March 16, 2018, 5:17pm

Finalspace

#14564

March 17, 2018

mmozeiko
Finalspace
FPL does not use a singleton pattern,...

You misunderstood me. I'm not talking about FPL code here. I'm talking about X11 implementation (libX11.so or whatever). If they use singleton, then that will show as false positive in valgrind.
1
void *basePtr = ::mmap(nullptr, newSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
As an unrelated topic - valgrind and similar tools won't show up this as memory leak in case you forgot to munmap it. They check the heap allocations for leaks, not mmap/sbrk/whatever syscalls.

Btw, I've looked more into your memory allocation functions and there are few issues/improvements:

1) first of all, your MemoryStackAllocate function is wrong. You cannot return memory allocated on stack from function. When function returns, memory is freed. It's pointless to return it. The best you can do is make a macro if you really want it.

2) MemoryAlignedAllocate is kind of pointless if it uses system allocator. Because system allocators return page aligned allocations. So they all will be at least 4k aligned.

1. You are totally right, i have no idea what i was thinking, its pointless so i remove it.

2. Not really, 4k alignment which is the page size for (win32 and linux) is not enough to satisfy alignment contraints for a lot of things. Therefore there is "aligned_malloc" which uses the system allocator as its base under the hood as well to return aligned memory. Smaller page size alignment is very important, especially when you use SIMD which requires 16-byte aligned addresses.

When you look into my implementation the function ensures that the base pointer is aligned to the boundary you specify by offseting it by the correct amount.

For example the allocation of the app state is aligned to a 16-byte boundary and when you execute it, it will be offset by 8 bytes, due to the fact that the size does not satisfy the alignment boundary.
But one thing is true, right know there is no reason to align the app state memory by 16-bytes. I dont use SIMD or anything like that in FPL.

Edited by Finalspace on March 17, 2018, 6:33am

Mārtiņš Možeiko

#14566

March 17, 2018

Wait, I'm not talking about aligned_malloc. I'm talking about your functions which don't use aligned_malloc.

Maybe I don't understand the purpose of your MemoryAllocate/MemoryAlignedAllocate functions? From code it looks like they are supposed to be wrappers around system allocators (VirtualAlloc/mmap), not heap alloctor (HeapAlloc/malloc). Yes, heap allocators align only on max 8. That's why thereis aligned_malloc or posix_memalign. But system allocators on modern OS'es always align at least on 4k. I do not know situation where you would want more than 4k alignment. Unless you are writing kernel and dealing with DMA or smth.... but that would require a lot more changes anyway. So imho your AlignedAllocate function is pointless.

Also you are using MemoryAlignedAllocate in your FPL to allocate bunch of very tiny allocations - event queue, state, etc.. This is waste of memory (and cache space). I suggest to replace them with heap allocations.

Finalspace

#14567

March 17, 2018

mmozeiko
Wait, I'm not talking about aligned_malloc. I'm talking about your functions which don't use aligned_malloc.

Maybe I don't understand the purpose of your MemoryAllocate/MemoryAlignedAllocate functions? From code it looks like they are supposed to be wrappers around system allocators (VirtualAlloc/mmap), not heap alloctor (HeapAlloc/malloc). Yes, heap allocators align only on max 8. That's why thereis aligned_malloc or posix_memalign. But system allocators on modern OS'es always align at least on 4k. I do not know situation where you would want more than 4k alignment. Unless you are writing kernel and dealing with DMA or smth.... but that would require a lot more changes anyway. So imho your AlignedAllocate function is pointless.

Also you are using MemoryAlignedAllocate in your FPL to allocate bunch of very tiny allocations - event queue, state, etc.. This is waste of memory (and cache space). I suggest to replace them with heap allocations.

As of now i changed the memory handling a lot, audio/video/event-queue memory block is now included in the app state memory as well. The only tiny allocation made now is the pixel data allocation for software buffer, which needs to be aligned by 4 bytes - otherwise win32 stretchblt does not work properly. Unfortunatly i cannot include this in the app state memory block, because the software buffer can be resized at any time.

Regarding the purpose of MemoryAllocate/MemoryAlignedAllocate: Its meant to be a call directly to allocate a block of memory from the OS directly - without relying on malloc() or aligned_malloc(). malloc() and aligned_malloc() does a lot of house-keeping to ensure that memory is tracked, etc. I want the caller to by responsible for house-keeping if he want. The library should do the smallest amount of work to get the job done.

Also i am planning to hopefully get rid of the C-runtime requirement, so i cant use malloc() or aligned_malloc() anyway.

Edited by Finalspace on March 17, 2018, 7:58am

Mārtiņš Možeiko

#14568

March 17, 2018

Finalspace
Regarding the purpose of MemoryAllocate/MemoryAlignedAllocate: Its meant to be a call directly to allocate a block of memory from the OS directly ...

This is exactly my point. OS directly gives you 4k aligned memory (via VirtualAlloc or mmap). Why do you want to align it again in MemoryAlignedAllocate?

Edited by Mārtiņš Možeiko on March 17, 2018, 8:08am

Finalspace

#14607

March 19, 2018

mmozeiko
Finalspace
Regarding the purpose of MemoryAllocate/MemoryAlignedAllocate: Its meant to be a call directly to allocate a block of memory from the OS directly ...

This is exactly my point. OS directly gives you 4k aligned memory (via VirtualAlloc or mmap). Why do you want to align it again in MemoryAlignedAllocate?

Like i already said, the same reason why there exists aligned_alloc(). Aligned_alloc will internally call mmap/VirtualAlloc or may use malloc() as its base as well and do the same similiar shit i and others do (All custom aligned malloc are implemented the same way -> Google it).

Another example: In Object Pascal there is a function called GetMem() which allocates dynamic memory on the heap (Similar to malloc(). This function does internally call VirtualAlloc() or mmap() as well, so it has same characteristics as malloc(). The smartest guy i know on this planet (Benjamin Rosseaux) have written a custom aligned allocation function based on that function and it does exactly what it means to be: Give me a block of memory where the address is aligned to my specific alignment requirements (16-bytes for sample, for working with SIMD based code).

https://github.com/BeRo1985/pasmp/blob/master/src/PasMP.pas:

class procedure TPasMPMemory.AllocateAlignedMemory(var p;Size:TPasMPInt32;Align:TPasMPInt32=PasMPCPUCacheLineSize);
var Original,Aligned:pointer;
    Mask:ptruint;
begin
 if (Align and (Align-1))<>0 then begin
  Align:=TPasMPMath.RoundUpToPowerOfTwo(Align);
 end;
 Mask:=Align-1;
 inc(Size,((Align shl 1)+SizeOf(pointer)));
 GetMem(Original,Size);
 FillChar(Original^,Size,#0);
 Aligned:=pointer(ptruint(ptruint(Original)+SizeOf(pointer)));
 if (Align>1) and ((ptruint(Aligned) and Mask)<>0) then begin
  inc(ptruint(Aligned),ptruint(ptruint(Align)-(ptruint(Aligned) and Mask)));
 end;
 pointer(pointer(ptruint(ptruint(Aligned)-SizeOf(pointer)))^):=Original;
 pointer(pointer(@p)^):=Aligned;
end;

class procedure TPasMPMemory.FreeAlignedMemory(const p);
var pp:pointer;
begin
 pp:=pointer(pointer(@p)^);
 if assigned(pp) then begin
  pp:=pointer(pointer(ptruint(ptruint(pp)-SizeOf(pointer)))^);
  FreeMem(pp);
 end;
end;

Now i hope that this is clear now.

Edited by Finalspace on March 19, 2018, 5:55am

Mārtiņš Možeiko

#14608

March 19, 2018

Of course "aligned" alloc function makes sense if it internally uses function which does not guarantee aligment.

In C aligned_alloc exists because it uses malloc. And malloc does not guarantee big enough alignment.
Or from your example in pascal - AllocateAlignedMemory makes sens because GetMem does not guarantee big enough alignment.

Your MemoryAlignedAllocate uses MemoryAllocate which internally guarantees 4k alignment. You are aligning already aligned memory (with few hacks around storing size in "header", that should be improved on API level). It does not make sense.

Now i hope that this is clear now.

I don't think you understand what I'm trying to say :) I completely understand how aligned_alloc works or why it is necessary, or how to implement it. I have implemented such function many times myself. And I'm not saying "aligned_alloc" does not make sense. I am only saying that your fpl::MemoryAlignedAllocate does not make sense, because it does uses fpl::MemoryAllocate which is defined as system allocator (which guarantees alignment, always!) and it does not implement custom heap, like C malloc or pascal GetMem does.

Here's how I would implement your fpl::MemoryAlignedAllocate function on Windows:

void *MemoryAlignedAllocate(const size_t size, const size_t alignment) {
    FPL_ASSERT(alignment <= 4096);
    return VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
}

As you can see, it does not need to use alignment, unless somebody would ask for >4k aligned memory. If you really really would want to support, then sure handle it such special case. But really, why would anybody ask for more-than-page-size aligned memory...

Edited by Mārtiņš Možeiko on March 19, 2018, 8:55am

Finalspace

#14609

March 19, 2018

mmozeiko
Of course "aligned" alloc function makes sense if it internally uses function which does not guarantee aligment.

In C aligned_alloc exists because it uses malloc. And malloc does not guarantee big enough alignment.
Or from your example in pascal - AllocateAlignedMemory makes sens because GetMem does not guarantee big enough alignment.

Your MemoryAlignedAllocate uses MemoryAllocate which internally guarantees 4k alignment. You are aligning already aligned memory (with few hacks around storing size in "header", that should be improved on API level). It does not make sense.

Now i hope that this is clear now.
I don't think you understand what I'm trying to say :) I completely understand how aligned_alloc works or why it is necessary, or how to implement it. I have implemented such function many times myself. And I'm not saying "aligned_alloc" does not make sense. I am only saying that your fpl::MemoryAlignedAllocate does not make sense, because it does uses fpl::MemoryAllocate which is defined as system allocator (which guarantees alignment, always!) and it does not implement custom heap, like C malloc or pascal GetMem does.

Here's how I would implement your fpl::MemoryAlignedAllocate function on Windows:
1
2
3
4
void *MemoryAlignedAllocate(const size_t size, const size_t alignment) {
    FPL_ASSERT(alignment <= 4096);
    return VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
}
As you can see, it does not need to use alignment, unless somebody would ask for >4k aligned memory. If you really really would want to support, then sure handle it such special case. But really, why would anybody ask for more-than-page-size aligned memory...

Then i totally dont understand alignment at all. The only thing i know, is that CPU´s want memory address to start on an even boundary, 4-byte, 8-byte, 16-byte etc. (On X64, 8-byte is the minimum alignment???). Also i know that some data-structures needs to be specially aligned to make it work (Mat4 multiply on a struct with SIMD). Also i know that compilers does default alignment automatically (4-byte on x86, 8-byte on x64), etc. Also i know that padding is added to certain fields in a struct to ensure alignment boundaries as well.

What does malloc() internally do, to not guarantee that the memory is aligned??

The docs says: "The malloc() and calloc() functions return a pointer to the allocated memory, which is suitably aligned for any built-in type.". Meaning that alignment may be 8-bytes, due to the fact that the greatest built-in type is double or what???

I am lost. I am not a system level programmer, maybe everything i know about is just wrong??

I really want to look into the source of malloc() and aligned_alloc() to see what it actually do, but i searched in libC and GCC and found no source definition...

Edited by Finalspace on March 19, 2018, 9:18am