A rookie mistake

Consider the following function checkEverySecondWithTimeout():

/**
 * Calls <code>checkMethod</code> every 1 second.
 * Returns when the check passes or the <code>checkTimeOut</code>
 * elapses, whichever is earlier.
 * <p/>
 * e.g, <code>checkEverySecondWithTimeout(() -> false, 5)</code> will return false after 5 seconds.
 *
 * @param checkMethod  A no-arg function that returns true to indicate pass; false otherwise
 * @param checkTimeOut Maximum number of seconds for which to execute the checks
 * @return true if the check passes, false if the timeout elapses
 * @throws InterruptedException
 */
public static boolean checkEverySecondWithTimeout(BooleanSupplier checkMethod, int checkTimeOut) throws InterruptedException {
    for (int i = 0; i < checkTimeOut; i++) {
        if (checkMethod.getAsBoolean()) {
            return true;
        }
        TimeUnit.SECONDS.sleep(1);
    }
    return false;
}

Can you spot the error in this function?

It makes the rookie mistake of assuming that sleep() is reliable. The call TimeUnit.SECONDS.sleep(1) will certainly put the thread to sleep for a second, that much we can rely upon. The problem is that the sleep can extend to beyond one second; perhaps 2 seconds on an autumn day and perhaps 5 seconds on a full moon night.

Essentially, there are no guarantees that the duration of sleep will be exact or even within a tolerance. This makes the checkTimeOut a lower bound at best and a long lasting siesta at worst.

A recent incident at work was traced in part to a function that, in essence, made the same mistake. It ran fine for eight months in production before failing. Since I was on support during this incident and helped diagnose the root cause, I naturally went back to see who wrote the code and when.

No prizes for guessing who it was ◔̯◔


Markdown formatting supported