Should I use int or unsigned int when working with STL container?

Unsigned types have three characteristics, one of which is qualitatively 'good' and one of which is qualitatively 'bad':

  • They can hold twice as many values as the same-sized signed type (good)
  • The size_t version (that is, 32-bit on a 32-bit machine, 64-bit on a 64-bit machine, etc) is useful for representing memory (addresses, sizes, etc) (neutral)
  • They wrap below 0, so subtracting 1 in a loop or using -1 to represent an invalid index can cause bugs (bad.) Signed types wrap too.

The STL uses unsigned types because of the first two points above: in order to not limit the potential size of array-like classes such as vector and deque (although you have to question how often you would want 4294967296 elements in a data structure); because a negative value will never be a valid index into most data structures; and because size_t is the correct type to use for representing anything to do with memory, such as the size of a struct, and related things such as the length of a string (see below.) That's not necessarily a good reason to use it for indexes or other non-memory purposes such as a loop variable. The reason it's best practice to do so in C++ is kind of a reverse construction, because it's what's used in the containers as well as other methods, and once used the rest of the code has to match to avoid the same problem you are encountering.

You should use a signed type when the value can become negative.

You should use an unsigned type when the value cannot become negative (possibly different to 'should not'.)

You should use size_t when handling memory sizes (the result of sizeof, often things like string lengths, etc.) It is often chosen as a default unsigned type to use, because it matches the platform the code is compiled for. For example, the length of a string is size_t because a string can only ever have 0 or more elements, and there is no reason to limit a string's length method arbitrarily shorter than what can be represented on the platform, such as a 16-bit length (0-65535) on a 32-bit platform. Note (thanks commenter Morwen) std::intptr_t or std::uintptr_t which are conceptually similar - will always be the right size for your platform - and should be used for memory addresses if you want something that's not a pointer. Note 2 (thanks commenter rubenvb) that a string can only hold size_t-1 elements due to the value of npos. Details below.

This means that if you use -1 to represent an invalid value, you should use signed integers. If you use a loop to iterate backwards over your data, you should consider using a signed integer if you are not certain that the loop construct is correct (and as noted in one of the other answers, they are easy to get wrong.) IMO, you should not resort to tricks to ensure the code works - if code requires tricks, that's often a danger signal. In addition, it will be harder to understand for those following you and reading your code. Both these are reasons not to follow @Jasmin Gray's answer above.

Iterators

However, using integer-based loops to iterate over the contents of a data structure is the wrong way to do it in C++, so in a sense the argument over signed vs unsigned for loops is moot. You should use an iterator instead:

std::vector<foo> bar;
for (std::vector<foo>::const_iterator it = bar.begin(); it != bar.end(); ++it) {
  // Access using *it or it->, e.g.:
  const foo & a = *it;

When you do this, you don't need to worry about casts, signedness, etc.

Iterators can be forward (as above) or reverse, for iterating backwards. Use the same syntax of it != bar.end(), because end() signals the end of the iteration, not the end of the underlying conceptual array, tree, or other structure.

In other words, the answer to your question 'Should I use int or unsigned int when working with STL containers?' is 'Neither. Use iterators instead.' Read more about:

  • Why use iterators instead of array indices in C++?
  • Why again (some more interesting points in the answers to this question)
  • Iterators in general - the different kinds, how to use them, etc.

What's left?

If you don't use an integer type for loops, what's left? Your own values, which are dependent on your data, but which in your case include using -1 for an invalid value. This is simple. Use signed. Just be consistent.

I am a big believer in using natural types, such as enums, and signed integers fit into this. They match our conceptual expectation more closely. When your mind and the code are aligned, you are less likely to write buggy code and more likely to expressively write correct, clean code.


Use the type that the container returns. In this case, size_t - which is an integer type that is unsigned. (To be technical, it's std::vector<MyType>::size_type, but that's usually defined to size_t, so you're safe using size_t. unsigned is also fine)

But in general, use the right tool for the right job. Is the 'index' ever supposed to be negative? If not, don't make it signed.

By the by, you don't have to type out 'unsigned int'. 'unsigned' is shorthand for the same variable type:

int myVar1;
unsigned myVar2;

The page linked to in the original question said:

Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce.

It's not just self-documentation, it's use the right tool for the right job. Saying that 'unsigned variables can cause bugs so don't use unsigned variables' is silly. Signed variables can also cause bugs. So can floats (more than integers). The only guaranteed bug-free code is code that doesn't exist.

Their example of why unsigned is evil, is this loop:

for (unsigned int i = foo.Length()-1; i >= 0; --i)

I have difficulty iterating backwards over a loop, and I usually make mistakes (with signed or unsigned integers) with it. Do I subtract one from size? Do I make it greater-than-AND-equal-to 0, or just greater than? It's a sloppy situation to begin with.

So what do you do with code that you know you have problems with? You change your coding style to fix the problem, make it simpler, and make it easier to read, and make it easier to remember. There is a bug in the loop they posted. The bug is, they wanted to allow a value below zero, but they chose to make it unsigned. It's their mistake.

But here's a simple trick that makes it easier to read, remember, write, and run. With unsigned variables. Here's the intelligent thing to do (obviously, this is my opinion).

for(unsigned i = myContainer.size(); i--> 0; )
{
    std::cout << myContainer[i] << std::endl;
}

It's unsigned. It always works. No negative to the starting size. No worrying about underflows. It just works. It's just smart. Do it right, don't stop using unsigned variables because someone somewhere once said they had a mistake with a for() loop and failed to train themselves to not make the mistake.

The trick to remembering it:

  1. Set 'i' to the size. (don't worry about subtracting one)
  2. Make 'i' point to 0 like an arrow. i --> 0 (it's a combination of post-decrementing (i--) and greater-than comparison (i > 0))

It's better to teach yourself tricks to code right, then to throw away tools because you don't code right.

Which would you want to see in your code?

for(unsigned i = myContainer.size()-1; i >= 0; --i)

Or:

for(unsigned i = myContainer.size(); i--> 0; )

Not because it's less characters to type (that'd be silly), but because it's less mental clutter. It's simpler to mentally parse when skimming through code, and easier to spot mistakes.


Try the code yourself