Why does string::compare return an int?
First, the specification is that it will return a value less
than, equal to or greater than 0
, not necessarily -1
or 1
.
Secondly, return values are rvalues, subject to integral
promotion, so there's no point in returning anything smaller.
In C++ (as in C), every expression is either an rvalue or an lvalue. Historically, the terms refer to the fact that lvalues appear on the left of an assignment, where as rvalues can only appear on the right. Today, a simple approximation for non-class types is that an lvalue has an address in memory, an rvalue doesn't. Thus, you cannot take the address of an rvalue, and cv-qualifiers (which condition "access") don't apply. In C++ terms, an rvalue which doesn't have class type is a pure value, not an object. The return value of a function is an rvalue, unless it has reference type. (Non-class types which fit in a register will almost always be returned in a register, for example, rather than in memory.)
For class types, the issues are a bit more complex, due to the
fact that you can call member functions on an rvalue. This
means that rvalues must in fact have addresses, for the this
pointer, and can be cv-qualified, since the cv-qualification
plays a role in overload resolution. Finally, C++11 introduces
several new distinctions, in order to support rvalue references;
these, too, are mainly applicable to class types.
Integral promotion refers to the fact that when integral types
smaller than an int
are used as rvalues in an expression, in
most contexts, they will be promoted to int
. So even if
I have a variable declared short a, b;
, in the expression a
+ b
, both a
and b
are promoted to int
before the addition
occurs. Similarly, if I write a < 0
, the comparison is done
on the value of a
, converted to an int
. In practice, there
are very few cases where this makes a difference, at least on
2's complements machines where integer arithmetic wraps (i.e.
all but a very few exotics, today—I think the Unisys
mainframes are the only exceptions left). Still, even on the
more common machines:
short a = 1;
std::cout << sizeof( a ) << std::endl;
std::cout << sizeof( a + 0 ) << std::endl;
should give different results: the first is the equivalent of
sizeof( short )
, the second sizeof( int )
(because of
integral promotion).
These two issues are formally orthogonal; rvalues and lvalues
have nothing to do with integral promotion. Except...
integral promotion only applies to rvalues, and most (but not
all) of the cases where you would use an rvalue will result in
integral promotion. For this reason, there is really no reason
to return a numeric value in something smaller than int
.
There is even a very good reason not to return it as
a character type. Overloaded operators, like <<
, often behave
differently for character types, so you only want to return
characters as character types. (You might compare the
difference:
char f() { return 'a'; }
std::cout << f() << std::endl; // displays "a"
std::cout << f() + 0 << std::endl; // displays "97" on my machine
The difference is that in the second case, the addition has
caused integral promotion to occur, which results in a different
overload of <<
to be chosen.
It is intentional that it doesn't return -1, 0 or 1.
It allows (note this is not for strings, but it applies equally to strings)
int compare(int *a, int *b)
{
return *a - *b;
}
which is a lot less cumbersome than:
int compare(int *a, int *b)
{
if (*a == *b) return 0;
if (*a > *b) return 1;
return -1;
}
which is what you'd have to do [or something along those lines] if you have to return -1, 0 or 1.
And it works for more complex types too:
class Date
{
int year;
int month;
int day;
}
int compare(const Date &a, const Date &b)
{
if (a.year != b.year) return a.year - b.year;
if (a.month != b.month) return a.month - b.month;
return a.day - b.day;
}
In the string case, we can do this:
int compare(const std::string& a, const std::string& b)
{
int len = min(a.length(), b.length());
for(int i = 0; i < len; i++)
{
if (a[i] != b[i]) return a[i] - b[i];
}
// We only get here if the string is equal all the way to one of them
// ends. If the length isn't equal, "longest" wins.
return a.length() - b.length();
}
int is usually (meaning on most modern hardware) an integer of the same size as the system bus and/or the cpu registers, what is called the machine word. Therefore int is usually passed along faster than smaller types, because it doesn't require alignment, masking and other operations.
The smaller types exist mainly to allow RAM usage optimization for arrays and structs. In most cases they trade a few CPU cycles (in the form of aligment operations) for a better RAM usage.
Unless you need to enforce your return value to be a signed or unsigned number of a centain size (char, short…) your are better off using int, which is why the standard library does it.