What's the difference between size_t and int in C++?
From the friendly Wikipedia:
The stdlib.h and stddef.h header files define a datatype called size_t which is used to represent the size of an object. Library functions that take sizes expect them to be of type size_t, and the sizeof operator evaluates to size_t.
The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to programming errors, particularly as 64-bit architectures become more prevalent.
Also, check Why size_t matters
size_t is the type used to represent sizes (as its names implies). Its platform (and even potentially implementation) dependent, and should be used only for this purpose. Obviously, representing a size, size_t is unsigned. Many stdlib functions, including malloc, sizeof and various string operation functions use size_t as a datatype.
An int is signed by default, and even though its size is also platform dependant, it will be a fixed 32bits on most modern machine (and though size_t is 64 bits on 64-bits architecture, int remain 32bits long on those architectures).
To summarize : use size_t to represent the size of an object and int (or long) in other cases.
The size_t
type is defined as the unsigned integral type of the sizeof
operator. In the real world, you will often see int
defined as 32 bits (for backward compatibility) but size_t
defined as 64 bits (so you can declare arrays and structures more than 4 GiB in size) on 64-bit platforms. If a long int
is also 64-bits, this is called the LP64 convention; if long int
is 32 bits but long long int
and pointers are 64 bits, that’s LLP64. You also might get the reverse, a program that uses 64-bit instructions for speed, but 32-bit pointers to save memory. Also, int
is signed and size_t
is unsigned.
There were historically a number of other platforms where addresses were wider or shorter than the native size of int
. In fact, in the ’70s and early ’80s, this was more common than not: all the popular 8-bit microcomputers had 8-bit registers and 16-bit addresses, and the transition between 16 and 32 bits also produced many machines that had addresses wider than their registers. I occasionally still see questions here about Borland Turbo C for MS-DOS, whose Huge memory mode had 20-bit addresses stored in 32 bits on a 16-bit CPU (but which could support the 32-bit instruction set of the 80386); the Motorola 68000 had a 16-bit ALU with 32-bit registers and addresses; there were IBM mainframes with 15-bit, 24-bit or 31-bit addresses. You also still see different ALU and address-bus sizes in embedded systems.
Any time int
is smaller than size_t
, and you try to store the size or offset of a very large file or object in an unsigned int
, there is the possibility that it could overflow and cause a bug. With an int
, there is also the possibility of getting a negative number. If an int
or unsigned int
is wider, the program will run correctly but waste memory.
You should generally use the correct type for the purpose if you want portability. A lot of people will recommend that you use signed math instead of unsigned (to avoid nasty, subtle bugs like 1U < -3
). For that purpose, the standard library defines ptrdiff_t
in <stddef.h>
as the signed type of the result of subtracting a pointer from another.
That said, a workaround might be to bounds-check all addresses and offsets against INT_MAX
and either 0
or INT_MIN
as appropriate, and turn on the compiler warnings about comparing signed and unsigned quantities in case you miss any. You should always, always, always be checking your array accesses for overflow in C anyway.