Programming Comments - std::this_thread::sleep

Summary

Tick, tock, tick, tock... How much can you get done in a second? While I know how long a second is, and I have some idea as to how long a millisecond is, I found I was always forgetting and having to look up the units of measurement smaller than milliseconds. Here is an easy place to look them up:

1 second == 1 second.
1 second == 1,000 milliseconds.
1 second == 1,000,000 microseconds.
1 second == 1,000,000,000 nanoseconds.
1 second == 1,000,000,000,000 picoseconds.
1 second == 1,000,000,000,000,000 femtoseconds.

Worded differently in case it is useful to you, that same list becomes:

It takes 1,000 milliseconds to make 1 second.
It takes 1,000 microseconds to make 1 millisecond.
It takes 1,000 nanoseconds to make 1 microsecond.
It takes 1,000 picoseconds to make 1 nanosecond.
It takes 1,000 femtoseconds to make 1 picosecond.

At the time I wrote this in mid-2013, on an Intel/AMD 3.4 GHz desktop running Ubuntu 13.04-64bit, nanoseconds is the finest scale available. And even then, as you'll see, there are some important limitations.

Available calls

Seconds

When working with seconds, there are a few common calls available. The standard time-related calls from 40 years ago in C of course had time(3) which worked with seconds as the basic unit of time. To observe 1 second elapse, the following calls can be made:

sleep( 1 );
boost::this_thread::sleep( boost::posix_time::seconds( 1 ) );
std::this_thread::sleep_for( std::chrono::seconds( 1 ) );

Milliseconds

The next scale down -- milliseconds -- has the following example calls:

boost::this_thread::sleep( boost::posix_time::milliseconds( 1000 ) );
std::this_thread::sleep_for( std::chrono::milliseconds( 1000 ) );

Microseconds

The next scale down -- microseconds -- has the following example calls:

boost::this_thread::sleep( boost::posix_time::microseconds( 1000000 ) );
std::this_thread::sleep_for( std::chrono::microseconds( 1000000 ) );
gettimeofday( ... );
usleep( 1000000 );
select( ... );

Note that usleep(3) has been obsoleted. See the man page for details.

Nanoseconds

And at the finest end of the scale we have nanoseconds:

std::this_thread::sleep_for( std::chrono::nanoseconds( 1000000000 ) );
nanosleep( ... );
clock_nanosleep( ... );
pselect( ... );

How long is a nanosecond

Let's drop immediately down to the most precise unit of time available on the GNU C++11 compiler I'm using. A nanosecond is 1/1,000,000,000 of a second. To attempt to put some meaning behind this number, it means there are as many nanoseconds in 1 second as there are seconds in 31.7 years.

Place 2 high-resolution timestamps side-by-side using C++11 (though the same calls can be made with Boost) and see how much time elapses between 2 function calls:

for ( size_t iteration = 0; iteration < 500; iteration ++ )
{
    std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
    std::chrono::high_resolution_clock::time_point end   = std::chrono::high_resolution_clock::now();
    std::chrono::nanoseconds duration = end - start;
    std::cout << "time elapsed: " << duration.count() << " ns" << std::endl;
}
	

On my 3.4 GHz desktop, I typically see between 40 ns and 43 ns have elapsed between the two consecutive calls to now():

40 42 40 40 40 40 41 42 40 40 40 40 ...

Of course, these results are specific to systems similar to mine. I wouldn't expect a 1 GHz system to perform the same way.

Yielding

If a yield() call is introduced between the timestamps like this:

for ( size_t iteration = 0; iteration < 500; iteration ++ )
{
    std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
    std::this_thread::yield();
    std::chrono::high_resolution_clock::time_point end   = std::chrono::high_resolution_clock::now();
    std::chrono::nanoseconds duration = end - start;
    std::cout << "time elapsed: " << duration.count() << " ns" << std::endl;
}
	

I see results of ~ 440 ns between the two calls:

438 451 436 445 436 438 437 451 443 451 438 446 434 ...

Remember that yield doesn't cause the thread to sleep. It just loses the remainder of the current timeslice. I suspect that on a busy system running a multi-threaded application, calling yield() could introduce much longer delays than when I tested this on my mostly-idle desktop.

Sleeping

Sleeping with nanoseconds resulted in surprising numbers. Since consecutive calls to get a timestamp showed me that at best I was seeing a minimum of 40 ns, I certianly didn't expect that sleeping for any number of ns less than 40 would be precise. But this code:

for ( size_t iteration = 0; iteration < 500; iteration ++ )
{
    std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
    std::this_thread::sleep_for( std::chrono::nanoseconds( 1 ) );
    std::chrono::high_resolution_clock::time_point end   = std::chrono::high_resolution_clock::now();
    std::chrono::nanoseconds duration = end - start;
    std::cout << "time elapsed: " << duration.count() << " ns" << std::endl;
}
	

...showed that attemping to sleep for 1 ns actually took between 54,000 and 55,000 nanoseconds! My results looked like this:

58911 54584 54238 54131 54125 54178 ...

Turns out this is the price to pay to make a system call into the kernel, remove the thread from the run queue, sleep for the specified duration, and place the thread back onto a queue where it can run. To put the numbers into context:

Side-by-side calls to take a timestamp with now() took 40 ns.
Calling yield() took 430 ns.
Calling sleep_for( 1 ns ) took 54,000+ ns.

Changing the sleep from 1 ns to 0 ns didn't have an effect. But using a negative value did have a big effect:

for ( size_t iteration = 0; iteration < 500; iteration ++ )
{
    std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
    std::this_thread::sleep_for( std::chrono::nanoseconds( -1 ) );
    std::chrono::high_resolution_clock::time_point end   = std::chrono::high_resolution_clock::now();
    std::chrono::nanoseconds duration = end - start;
    std::cout << "time elapsed: " << duration.count() << " ns" << std::endl;
}
	

The call to sleep_for( -1 ) seems to short circuit and (relatively) quickly return to the caller, because this gave me the following results:

148 143 145 143 148 145 146 143 143 ...

So in essense, this causes a few CPU cycles to be wasted without having put the thread into proper sleep.

CPU Frequencies

Having an idle desktop while running these timing tests negatively affected the output. Modern CPUs of course use frequency scaling to save power when a system isn't in heavy use. I installed cpufrequtils and forced my system to use the highest cpu frequency:

sudo apt-get install cpufrequtils
sudo cpufreq-set -r -g performance

Once I did that, the number of nanoseconds between consecutive calls dropped from 40 ns to less than 20 ns:

14 19 14 14 19 14 14 19 14 ...

The call to yield() dropped from ~ 440 ns to 175+ ns:

172 175 171 183 174 175 172 189 173 177 ...

The call to sleep_for( 1 ns ) took 51,600+ ns instead of the previous 54,000+ ns:

51661 51674 51670 51656 51653 51653 51661 51656 ...

And the call to sleep_for( -1 ns ) dropped from ~ 145 ns to ~ 60 ns:

57 61 57 62 57 63 57 62 57 56 64 57 ...