The 2024 Wheel Reinvention Jam is in 16 days. September 23-29, 2024. More info
Explore your program's memory like you explore code with an IDE.

Complete

About memory hexplorer


At the most fundamental level, everything in a computer is represented using bits and bytes, including the code we write and the data structures in our running programs.

But we usually don’t think of our code as a bunch of bytes (just like we usually don’t think of our fellow human friends as piles of atoms). Instead, we have various sophisticated tools like IDEs and language servers that let us work with the code at higher levels of abstraction.

When it comes to the contents of our program’s memory, however, our tools barely help us beyond the bits and bytes stage. Most of us only have vague notions of what RAM contains. “Variables go on the stack, allocations are on the heap.” Okay, but where exactly? And what does that look like?

To illustrate this discrepancy between code and runtime structures, let’s take a look at what our current tools can do for us:

Instead of working with bits and bytes directly, we have editors that decode the binary text and render it as glyphs, laid out on independent lines: image.png

More advanced text editors and IDEs recognize program entities like types and functions, which they may list in an outline: image.png

And to reason about the relationships between program entities, some tools provide features like “go to definition” or “list all references.” (references of the function fib shown below): image.png

Now, let’s take a look at a typical debugger’s visualization of memory: image.png Clearly, there's some structure to these values. But what do they mean? Are we looking at the stack, the heap, something else? How do these bytes relate to other bytes somewhere else in RAM?

My goal for this jam was to create a tool to answer these questions, and here are my results:

For context, my application visualizes the memory of the following C program, paused inside the malloc call in the 6th iteration of the for loop: image.png

So, what does the memory look like? image.png Like in the memory visualizer from earlier, we can still see the raw byte values, 16 of them per row.

But hey, look, we found the stack! The brown region is the main function's stack frame. And sure enough, i = 5.

In fact, the way I found the stack was using the jump list, which lists program entities, akin to the "outline" from earlier: image.png

Finally, for the third level of abstraction, entity relationships: Pointer values like List.next can be clicked to jump to the referenced value (similar to "go to definition"). And the inspector shows incoming and outgoing pointers for the selected object (similar to "list all references"). To see these in action, watch the last demo video post from Sunday below. image.png

All in all, I'm very happy with my results. Yes the ui could look better (the contrast is especially bad, sorry), and yes it could be faster. But those things weren't the goal.

The goal was to be able to see what memory looks like and to be able to explore it by following pointers. Going into this, I wasn't sure how useful such a tool would be, as the individual data structures would probably better be visualized by other means like lists and graphs. But to my surprise, I found the "annotated hex viewer" approach quite fruitful. I've already improved my mental model of memory. And I'm sure this tool would be invaluable for debugging memory safety issues (for example, you'd see what a dangling pointer was actually pointing to).

But before this tool could be integrated into a debugger, there are still several open problems:

  1. It currently doesn't support unions or "inconsistent" memory mappings, where two pointers of different types point to the same memory location.

  2. It currently doesn't understand how large dynamic arrays are (the dark green byte down below is the first byte of an array). And, relying on dwarf debug info, it can't understand more complicated memory layouts like joined allocations (the orange Group is the start of a swiss table hashmap buffer; immediately after the Group is an array of Slot<K, V>, the hashmap entries). image.png

  3. Other languages like rust, which use a lot of type abstraction, could use some (automatic and/or manual) type simplification, as they're not exactly readable: image.png

Alright, that's it from me, thanks for reading and take care ✌️

Read more
Filters

Recent Activity

and finally, interactivity &hexplorer
the new jump list allows you to quickly find global variables, stack frames, or instances of a given type.
you can now follow pointers by clicking their values.
and, among other things, the new inspector shows you the incoming and outgoing pointers for the selected object.

View original message on Discord

drawing the memory map from yesterday! &hexplorer
as i scroll through memory, we first see some global variables (including the malloc state).
those are followed by the stack (the cyan/green/brown bands are stack frames; the stack is quite a bit larger, i'm current not drawing zero bytes).
and at the end, there's the heap. where we can see a linked List (created by the demo program).

View original message on Discord

mapped out memory using debug info. &hexplorer
i spent quite a while figuring out how to deal with aliasing.
turns out, my test program (almost) doesn't have any aliasing 😄

edit: yeah, no, of course it does. silly little bug.
for anyone interested, i build the mapping in two passes.
first, i collect all unique (ptr, type) pairs by following pointers depth first. (this uses a visited set, which caught the aliasing)
then i sort those by decreasing type size and create the mapping.
sorting makes sure i create mappings for structs before processing pointers to their fields (roughly). which is useful for storing "reverse pointers".

View original message on Discord