Linux Scheduling Granularity

Lachlan

#25116

September 21, 2021

I'm following along with day 018 (Enforcing A Video Frame Rate) using Linux. I'm using nanosleep and with the default scheduler policy SCHED_OTHER. Using this with a desired frame time of 16.67ms, frame times jump from 16ms to around 30ms basically every alternate frame, e.g. 30ms, 16ms, 31ms, 16ms, 29ms. This is under no stress at all. I assume this is something to do with the granularity of nanosleep.

I thought about setting the scheduler policy to SCHED_FIFO and setting sched_get_priority_max(SCHED_FIFO) and the nanosleep would yield the cpu to other tasks. With other processes running like xorg I thought this may cause issues for the end-user.

Therefore, I implemented a while (1) loop that just counts away the ticks, i.e. will 'melt' the CPU. Using this, I get much more consistent frame times. However, the frame time is less than what it's supposed to be, e.g. 11ms.

INTERNAL long
timespec_diff(struct timespec *start, struct timespec *end)
{
  return (BILLION * (end->tv_sec - start->tv_sec)) +
         (end->tv_nsec - start->tv_nsec);
}

int
main(int argc, char *argv[])
{
  long desired_ns_per_frame = BILLION / 60.0f;

  struct timespec prev_timespec = {};
  clock_gettime(CLOCK_MONOTONIC_RAW, &prev_timespec);

  while (true)
  {
    // work here  

    struct timespec end_timespec = {};
    clock_gettime(CLOCK_MONOTONIC_RAW, &end_timespec);
    long ns_elapsed = timespec_diff(&prev_timespec, &end_timespec);

    long ns_delta = desired_ns_per_frame - ns_elapsed;
    while (timespec_diff(&prev_timespec, &end_timespec) < ns_delta)
    {
      clock_gettime(CLOCK_MONOTONIC_RAW, &end_timespec);
    }
  }
}

Can anyone explain this?

Does anyone have any suggestions/experiences with this on Linux? Thanks.

Edited by Ben Visness on September 21, 2021, 3:02pm Reason: improved code formatting

Mārtiņš Možeiko

#25123

September 21, 2021

What is 11 msec - how that is measured?

Don't you need to set prev_timespec to end_timespec at end of your loop?

In general there's not much you can do with this if you use software rendering. Alternatively you might want to render only in response to "paint/damage" messages/events, and have periodic timer to advance animations/updates.

Or if you have compositor running, find a way to wait on compositor "swap-buffer" events.

My recommendation, even if you're doing software rendering, would be to use GL to upload & present pixels to window - just a last step. This way pixel upload will be way more efficient, and you'll be able to wait on vsync to really put CPU to sleep.

Lachlan

#25130

September 23, 2021

Sorry, I forget to include the code that swapped the timespec values. Here is a version that can be compiled:

#include <time.h>
#include <stdbool.h>
#include <stdio.h>

#define BILLION 1000000000L

long
timespec_diff(struct timespec *start, struct timespec *end)
{
  return (BILLION * (end->tv_sec - start->tv_sec)) +
         (end->tv_nsec - start->tv_nsec);
}

int
main(int argc, char *argv[])
{
  long desired_ns_per_frame = BILLION / 60.0f;

  struct timespec prev_timespec = {};
  clock_gettime(CLOCK_MONOTONIC_RAW, &prev_timespec);

  while (true)
  {
    // work here  

    struct timespec end_timespec = {};
    clock_gettime(CLOCK_MONOTONIC_RAW, &end_timespec);
    long ns_elapsed = timespec_diff(&prev_timespec, &end_timespec);

    long ns_delta = desired_ns_per_frame - ns_elapsed;
    while (timespec_diff(&prev_timespec, &end_timespec) < ns_delta)
    {
      clock_gettime(CLOCK_MONOTONIC_RAW, &end_timespec);
    }

    struct timespec final_timespec = {};
    clock_gettime(CLOCK_MONOTONIC_RAW, &final_timespec);
    printf("ms: %f\n", timespec_diff(&prev_timespec, &final_timespec) / 1000000.0f); 

    prev_timespec = end_timespec;
  }
}

When using nanosleep, I understand why the timings are off, however using this busy-loop, I don't understand why the time is less, e.g. 11ms or 12ms instead of 16ms

Edited by Lachlan on September 23, 2021, 2:49am

Replying to mmozeiko (#25123)

Mārtiņš Možeiko

#25131

September 23, 2021

That's because you should be comparing final_timespec value to value of final_timespec in previous frame, not end_timespec - as that will exclude time spend in "work here" part.

struct timespec prev_final_timespec;
clock_gettime(CLOCK_MONOTONIC_RAW, &prev_final_timespec);
   
while (true)
{
    // ... work here

    // ... your ns_delta timing loop here

    struct timespec final_timespec = {};
    clock_gettime(CLOCK_MONOTONIC_RAW, &final_timespec);

    // ... here show difference between final_timespec and prev_final_timespec

    prev_final_timespec = final_timespec;
}

Edited by Mārtiņš Možeiko on September 23, 2021, 3:15am

Lachlan

#25133

September 23, 2021

Sorry, I must be misunderstanding what you're saying as I feel like that is what I'm doing...

If I change the last line:

prev_timespec = end_timespec;
// CHANGE TO
prev_timespec = final_timespec;

The result is more 'correct', however I'm still getting values less than 16ms.

Replying to mmozeiko (#25131)