Why is the value of std::string::max_size "strange"?
One of the indices, the largest representable to be more specific, is reserved for the std::string::npos
value, which represents a "not found" result in some string functions. Furthermore, the strings are internally null terminated, so one position must be reserved for the null termination character.
This brings us to a theoretical maximum of radix^bits - 3
that the standard library could provide (unless those reserved positions could be share the same value; I'm not 100% sure that would be impossible). Presumably the implementation has chosen to reserve two more indices for internal usage (or I've missed some necessarily reserved position). One potential usage for such reserved index that I could imagine might be an overflow trap, which detects accesses out of bounds.
From practical point of view: std::string::size_type
is usually the same width as the address space, and under such assumption it's not practically possible to use the entire address space for a single string anyway. As such, the number reported by the library is usually not achievable; It is just an upper bound set by the standard library implementation and the actual size limit of a string is subject to limitations from other sources - most often by the amount of available RAM.
In addition to what eerorika wrote…
- Strings can (and in multiple cases do) use "strange" layouts. E.g., prior to GCC 5's C++11-conformant string implementation, a
std::string
was implemented as a single pointer to a heap block(1) that contained the character data, and possible NUL terminator, starting at the pointed-to address, but that character data was prefaced with size, capacity and a reference count (for copy-on-write aka COW). - In general, there's only one way to know what the specific implementation is doing – looking at its source code.
- Implementations are required to provide
max_size()
and incentivized to makemax_size
appear large enough for practical purposes. However, they often provide values that are unrealistically large. E.g., even the 2^32-5 figure seems absurd from a practical perspective on a 32-bit flat memory model, because it would assume that the entire rest of the program takes up 4 bytes or less (with one byte allotted for the string's NUL terminator). The 2^62 figure on AMD64 is equally absurd because even a hypothetical fully implemented long mode – i.e. requiring a future CPU – will "only" support 2^52 distinct physical addresses (technically, swapping or RAM compression could work, but is this really the intent?). BTW, the reason 2^62 may have been chosen as opposed to, let's say, 2^64 minus some small integer, is that the implementers at least realized that the kernel will always reserve part of the virtual address space for its own purposes.
Long story short… they have to provide a value, so they do, but they don't care enough to make it accurate and meaningful. At least you can assume that strings longer than max_size()
are definitely impossible.
(1): Well, commonly – the statically allocated empty string being the physically tiny but conceptually big exception.