What are the mechanics of short string optimization in libc++?
The libc++ implementation is a bit complicated, I'll ignore its alternate design and suppose a little endian computer:
template <...>
class basic_string {
/* many many things */
struct __long
{
size_type __cap_;
size_type __size_;
pointer __data_;
};
enum {__short_mask = 0x01};
enum {__long_mask = 0x1ul};
enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?
(sizeof(__long) - 1)/sizeof(value_type) : 2};
struct __short
{
union
{
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};
union __ulx{__long __lx; __short __lxx;};
enum {__n_words = sizeof(__ulx) / sizeof(size_type)};
struct __raw
{
size_type __words[__n_words];
};
struct __rep
{
union
{
__long __l;
__short __s;
__raw __r;
};
};
__compressed_pair<__rep, allocator_type> __r_;
}; // basic_string
Note: __compressed_pair
is essentially a pair optimized for the Empty Base Optimization, aka template <T1, T2> struct __compressed_pair: T1, T2 {};
; for all intents and purposes you can consider it a regular pair. Its importance just comes up because std::allocator
is stateless and thus empty.
Okay, this is rather raw, so let's check the mechanics! Internally, many functions will call __get_pointer()
which itself calls __is_long
to determine whether the string is using the __long
or __short
representation:
bool __is_long() const _NOEXCEPT
{ return bool(__r_.first().__s.__size_ & __short_mask); }
// __r_.first() -> __rep const&
// .__s -> __short const&
// .__size_ -> unsigned char
To be honest, I am not too sure this is Standard C++ (I know the initial subsequence provision in union
but do not know how it meshes with an anonymous union and aliasing thrown together), but a Standard Library is allowed to take advantage of implementation defined behavior anyway.
The libc++ basic_string
is designed to have a sizeof
3 words on all architectures, where sizeof(word) == sizeof(void*)
. You have correctly dissected the long/short flag, and the size field in the short form.
what value would __min_cap, the capacity of short strings, take for different architectures?
In the short form, there are 3 words to work with:
- 1 bit goes to the long/short flag.
- 7 bits goes to the size.
- Assuming
char
, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).
This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity()
without an allocation).
On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.
On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.
A major design goal was to minimize sizeof(string)
, while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof
, the more words you have to move during a move construction or move assignment.
The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.
_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
which rearranges the data members such that the "long layout" changes from:
struct __long
{
size_type __cap_;
size_type __size_;
pointer __data_;
};
to:
struct __long
{
pointer __data_;
size_type __size_;
size_type __cap_;
};
The motivation for this change is the belief that putting __data_
first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.
The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string
compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
will create run time errors.
I recommend this flag only be changed by a vendor of libc++.