Fastest way to determine if an integer is between two integers (inclusive) with known sets of values

There's an old trick to do this with only one comparison/branch. Whether it'll really improve speed may be open to question, and even if it does, it's probably too little to notice or care about, but when you're only starting with two comparisons, the chances of a huge improvement are pretty remote. The code looks like:

// use a < for an inclusive lower bound and exclusive upper bound
// use <= for an inclusive lower bound and inclusive upper bound
// alternatively, if the upper bound is inclusive and you can pre-calculate
//  upper-lower, simply add + 1 to upper-lower and use the < operator.
    if ((unsigned)(number-lower) <= (upper-lower))
        in_range(number);

With a typical, modern computer (i.e., anything using twos complement), the conversion to unsigned is really a nop -- just a change in how the same bits are viewed.

Note that in a typical case, you can pre-compute upper-lower outside a (presumed) loop, so that doesn't normally contribute any significant time. Along with reducing the number of branch instructions, this also (generally) improves branch prediction. In this case, the same branch is taken whether the number is below the bottom end or above the top end of the range.

As to how this works, the basic idea is pretty simple: a negative number, when viewed as an unsigned number, will be larger than anything that started out as a positive number.

In practice this method translates number and the interval to the point of origin and checks if number is in the interval [0, D], where D = upper - lower. If number below lower bound: negative, and if above upper bound: larger than D.


It's rare to be able to do significant optimizations to code on such a small scale. Big performance gains come from observing and modifying the code from a higher level. You may be able to eliminate the need for the range test altogether, or only do O(n) of them instead of O(n^2). You may be able to re-order the tests so that one side of the inequality is always implied. Even if the algorithm is ideal, gains are more likely to come when you see how this code does the range test 10 million times and you find a way to batch them up and use SSE to do many tests in parallel.


It depends on how many times you want to perform the test over the same data.

If you are performing the test a single time, there probably isn't a meaningful way to speed up the algorithm.

If you are doing this for a very finite set of values, then you could create a lookup table. Performing the indexing might be more expensive, but if you can fit the entire table in cache, then you can remove all branching from the code, which should speed things up.

For your data the lookup table would be 128^3 = 2,097,152. If you can control one of the three variables so you consider all instances where start = N at one time, then the size of the working set drops down to 128^2 = 16432 bytes, which should fit well in most modern caches.

You would still have to benchmark the actual code to see if a branchless lookup table is sufficiently faster than the obvious comparisons.