Why are Standard iterator ranges [begin, end) instead of [begin, end]?

The best argument easily is the one made by Dijkstra himself:

You want the size of the range to be a simple difference end − begin;
including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n₀, n₁)[n₁, n₂)[n₂,n₃). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^               ^
   |               |
 begin            end

Obviously begin points to the beginning of the sequence, and end points to the end of the same sequence. Dereferencing begin accesses the element A, and dereferencing end makes no sense because there's no element right to it. Also, adding an iterator i in the middle gives

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
 begin     i      end

and you immediately see that the range of elements from begin to i contains the elements A and B while the range of elements from i to end contains the elements C and D. Dereferencing i gives the element right of it, that is the first element of the second sequence.

Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:

   +---+---+---+---+
   | D | C | B | A |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
rbegin     ri     rend
 (end)    (i)   (begin)

I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i (which I've named ri) still points in between elements B and C. However due to reversing the sequence, now element B is on the right to it.

Why are Standard iterator ranges [begin, end) instead of [begin, end]?

Tags:

C++

Stl

Standards

Iterator

Related

Recent Posts