C++ regex segfault on long sequences

Is this a bug? If so, should I report it?

Yes this is a bug.

cout << '"' << regex_replace("Small text\n\nwith several\n\nlines." + string(22311, ' '), regex("\\s+", regex::optimize), " ") << '"' << endl;
  • Runs fine with libc++: http://coliru.stacked-crooked.com/a/f9ee5438745a5b22
  • Runs fine with Visual Studio 2015, you can test by copying and running the code at: http://webcompiler.cloudapp.net/
  • Fails with libstdc++: http://coliru.stacked-crooked.com/a/3f4bbe5c46b6b627

This has been bugged in libstdc++ here.

Is there smart way to overcome the problem?

If you're asking for a new regex that works, I've tried a handful of different versions, and all of them fail on libstdc++, so I'd say, if you want to use a regex to solve this, you'll need to compile against libc++.

But honestly if you're using a regex to strip duplicate white space, "Now you have two problems"

A better solution could use adjacent_find which runs fine with libstdc++ as well:

const auto func = [](const char a, const char b){ return isspace(a) && isspace(b); };

for(auto it = adjacent_find(begin(test), end(test), func); it != end(test); it = adjacent_find(it, end(test), func)) {
    *it = ' ';
    it = test.erase(next(it), find_if_not(next(it), end(test), [](const auto& i) { return isspace(i); }));
}

This will return the same thing your regex would:

"Small text with several lines. "

But if you're going for simplicity, you could also use unique:

test.resize(distance(test.begin(), unique(test.begin(), test.end(), [](const auto& a, const auto& b) { return isspace(a) && isspace(b); })));

Which will return:

"Small text
with several
lines. "