Every jam project of mine seems to involve drag and drop in some way. I build it from scratch every time, and every time there is some aspect of it that absolutely kicks my ass. I figured I should jot down my overall approach here so people (read: my future self) can maybe learn something from it.

In my opinion, any self-respecting drag and drop system needs to handle a couple potentially tricky features:

  • "Gesture recognition", or rather, allowing clicks and double-clicks on draggable items. In particular I find it very important that an item is not recognized as dragging until the mouse has strayed by a few pixels from the location where it went down. It is very common for the mouse to drift by a pixel or two while someone is clicking, and if this cancels the click, users will be frustrated.

    In addition, I feel strongly that, once a drag starts, the dragged item should subsequently be offset by mouse_pos - original_mouse_down_pos, not mouse_pos - mouse_pos_when_drag_gesture_recognized. Once a gesture has been recognized, the app should act as if that gesture had been in effect from frame 0.

  • Cancellation, either by pressing escape or by dropping in an invalid place. This means that all UI state should revert to how it was before the mouse even went down, and possibly any application state too.

I implemented both of these things for Solitaire, but it took me a couple days due to a rotating cast of stupid bugs.

My final implementation

The entirety of the relevant state for mouse interactions is as follows:

// Input
mouse_pos:            sdl3.FPoint,
mouse_down:           [10]bool,
mouse_pressed:        [10]bool,
mouse_released:       [10]bool,

// UI state
potential_hot_items:  [dynamic]PotentialHotItem,
hot_item:             string,
hot_mouse_button:     int,
drag_supported:       bool,
dragging:             bool,
drag_canceled:        bool,
drag_start_item_pos:  V2,
drag_start_mouse_pos: V2,

// UI actions for hot item

// This only means that the mouse went up, NOT that the mouse went up in the
// right place. If your UI control cares about this, e.g. to avoid activating
// a button on release when it moved too far away, filter for this in the UI
// control.
clicked:              bool,

// The hot item had a drag started on it this frame.
drag_started:         bool,

// The hot item was dropped this frame.
drag_ended:           bool,

The "hot" item is the item currently under user interaction; the term is borrowed (possibly incorrectly) from Casey's original IMGUI video. For the hot item, we store a string ID that is unique to it (e.g. card:S2N1 for the ace of spades) as well as the button that was used to initiate the interaction; this allows us to distinguish between left-clicks and right-clicks (and associated clicks-and-drags). drag_supported is set to true for any items that can be dragged; this tells the gesture-recognition code whether to cancel a click if the mouse strays from its original position. potential_hot_items is a list of items which may potentially be interacted with; we do this because gestures are always recognized at the end of the frame. (More on that later.)

Side note: I prefer the term "hot" over "active" because I grew up making websites and CSS uses "active" to refer to the "mouse down right now" state, and if the mouse strays too far away from the button, we should no longer render it as "active" because the button will not activate on mouse up.

Video demonstrating this

The remaining "UI state" variables are used to support drag-and-drop: dragging is true if a drag is in progress, drag_canceled is true if the user pressed Escape, and the pos variables are used to compute pixel-perfect offsets while dragging.

Finally, there are three "UI action" flags, which are mutually exclusive and the result of a completed gesture. (In retrospect, this could have been an enum.) The comments on these fields are self-explanatory, but the important thing to understand is that these are never, ever set mid-frame.

The overall game loop, including input events and gesture recognition, goes according to the following pseudo-Odin:

frame :: proc() {
  free_all(context.temp_allocator)

  // Prep a list of potentially hot items
  potential_hot_items = make([dynamic]PotentialHotItem, context.temp_allocator)

  // Process all input events
  e: sdl3.Event
  for sdl3.PollEvent(&e) {
    // mouse_pos, mouse_down, mouse_pressed, and mouse_released will
    // be updated here
    process_event(&e);
  }

  do_ui()
  // UI code may do the following:
  // - `append(potential_hot_items, { ...info })`
  // - Check `clicked`, `drag_started`, or `drag_ended` and do
  //   arbitrary actions in response
  // - `clear_ui_action()` if an event has been consumed

  // Start or update gestures
  if hot_item != "" {
    recognize_gestures()
    // Will set `clicked`, `drag_started`, or `drag_ended` if a gesture
    // is complete, or will update gesture-tracking state, or will clear
    // gesture state if gestures have all been acted on.
  } else {
    // Walk the list of interactive items attempting to start a new
    // gesture. The list is walked in reverse order because...well,
    // because there is no explicit depth-sorting in my renderer, so
    // walking these items in reverse will make sure I interact with
    // the front-most items first :)
    new_hotness: for item in reversed(potential_hot_items) {
      if sdl3.PointInRectFloat(mouse_pos, item.rect) {
        for btn in []int{MOUSE_LEFT, MOUSE_RIGHT} {
          if mouse_pressed[btn] {
            set_hot(item, btn)
            break new_hotness
          }
        }
      }
    }
  }

  // HACK: Clean up canceled drags :)
  if !ctx.dragging && user_is_still_dragging_cards() {
    put_the_cards_back()
  }
}

The most important thing for my own sanity was grouping all the gesture-recognition logic at the end of the frame, and therefore always choosing to handle a gesture on a subsequent frame. When you can have multiple potential gestures in flight, randomly checking for mouse_pressed or mouse_released mid-frame can easily lead to recognizing the wrong gesture or triggering UI actions on multiple items at once. It also seems to result in more "UI tearing", where UI state changes halfway through rendering.

Overall I feel like this system worked ok, but you can see at the end that cancellation is essentially a total hack. I don't have a great answer at the moment for how to do this better. This system can also still have UI tearing, since some game-state decisions are made mid-frame; I get around this by e.g. always rendering the currently-dragging cards last, which means that starting a drag mid-frame will never result in the card disappearing for a frame.

I'm actually not sure what else to say about this, so, the end?