Legality of COW std::string implementation in C++11
It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for
— as an argument to any standard library function taking a reference to non-const basic_string as an argument.
— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.
For a COW string, calling non-const operator[]
would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.
The answers by Dave S and gbjbaanb are correct. (And Luc Danton's is correct too, although it's more a side-effect of forbidding COW strings rather than the original rule that forbids it.)
But to clear up some confusion, I'm going to add some further exposition. Various comments link to a comment of mine on the GCC bugzilla which gives the following example:
std::string s("str");
const char* p = s.data();
{
std::string s2(s);
(void) s[0];
}
std::cout << *p << '\n'; // p is dangling
The point of that example is to demonstrate why GCC's reference counted (COW) string is not valid in C++11. The C++11 standard requires this code to work correctly. Nothing in the code permits the p
to be invalidated in C++11.
Using GCC's old reference-counted std::string
implementation, that code has undefined behaviour, because p
is invalidated, becoming a dangling pointer. (What happens is that when s2
is constructed it shares the data with s
, but obtaining a non-const reference via s[0]
requires the data to be unshared, so s
does a "copy on write" because the reference s[0]
could potentially be used to write into s
, then s2
goes out of scope, destroying the array pointed to by p
).
The C++03 standard explicitly permits that behaviour in 21.3 [lib.basic.string] p5 where it says that subsequent to a call to data()
the first call to operator[]()
may invalidate pointers, references and iterators. So GCC's COW string was a valid C++03 implementation.
The C++11 standard no longer permits that behaviour, because no call to operator[]()
may invalidate pointers, references or iterators, irrespective of whether they follow a call to data()
.
So the example above must work in C++11, but does not work with libstdc++'s kind of COW string, therefore that kind of COW string is not permitted in C++11.
It is, CoW is an acceptable mechanism for making faster strings... but...
it makes multithreading code slower (all that locking to check if you're the only one writing kills performance when using a lot of strings). This was the main reason CoW was killed off years ago.
The other reasons are that the []
operator will return you the string data, without any protection for you to overwrite a string someone else expects to be unchanging. The same applies to c_str()
and data()
.
Quick google says that the multithreading is basically the reason it was effectively disallowed (not explicitly).
The proposal says :
Proposal
We propose to make all iterator and element access operations safely concurrently executable.
We are increasing the stability of operations even in sequential code.
This change effectively disallows copy-on-write implementations.
followed by
The largest potential loss in performance due to a switch away from copy-on-write implementations is the increased consumption of memory for applications with very large read-mostly strings. However, we believe that for those applications ropes are a better technical solution, and recommend a rope proposal be considered for inclusion in Library TR2.
Ropes are part of STLPort and SGIs STL.