Why are Standard iterator ranges [begin, end) instead of [begin, end]?
The best argument easily is the one made by Dijkstra himself:
You want the size of the range to be a simple difference end − begin;
including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.
You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.
The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n0, n1)[n1, n2)[n2,n3). Another example is the standard iteration loop for (it = begin; it != end; ++it)
, which runs end - begin
times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.
Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.
In a nutshell: the fact that we don't see the number 1
everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.
Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:
+---+---+---+---+
| A | B | C | D |
+---+---+---+---+
^ ^
| |
begin end
Obviously begin
points to the beginning of the sequence, and end
points to the end of the same sequence. Dereferencing begin
accesses the element A
, and dereferencing end
makes no sense because there's no element right to it. Also, adding an iterator i
in the middle gives
+---+---+---+---+
| A | B | C | D |
+---+---+---+---+
^ ^ ^
| | |
begin i end
and you immediately see that the range of elements from begin
to i
contains the elements A
and B
while the range of elements from i
to end
contains the elements C
and D
. Dereferencing i
gives the element right of it, that is the first element of the second sequence.
Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:
+---+---+---+---+
| D | C | B | A |
+---+---+---+---+
^ ^ ^
| | |
rbegin ri rend
(end) (i) (begin)
I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i
(which I've named ri
) still points in between elements B
and C
. However due to reversing the sequence, now element B
is on the right to it.