Using `std::search` over `string::find`
Right now (27th April 2017), at least GCCs libstdc++
(which is also used by clang by default), implements std::string::find
with a linear search and thus is much slower than using
std::string_view substr{"whatever"};
auto it = std::search(s.cbegin(), s.cend(),
std::boyer_moore_searcher(substr.begin(), substr.end()));
The problem is that the Boyer-Moore searcher allocates memory for internal data structures, and thus can fail with a std::bad_alloc
exception. However, std::string::find
is marked noexcept
, so using the already implemented Boyer-Moore searcher within std::string::find
isn't straight-forward.
string::find
uses linear search but it is several times faster than Boyer Moore for some cases (with the latest patch). I submitted a patch (first-element then memcomp) to both libstdc++ and libc++ which improved string::find
significantly. You can try the recent gcc (7.1) and you will get the improved performance. You can also measure the performance with the simple benchmarking suite I wrote:
https://github.com/hiraditya/std-benchmark
Especially for smaller strings, by the time Boyer Moore is busy constructing internal data structure, (sub) linear string::find will be done. Also for parsing HTML etc., where most of the searches are mismatches, string::find should be faster.
commit fc7ebc4b8d9ad7e2891b7f72152e8a2b7543cd65
Author: redi <redi@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Mon Jan 9 13:05:58 2017 +0000
PR66414 optimize std::string::find
2017-01-09 Jonathan Wakely <[email protected]>
Aditya Kumar <[email protected]>
PR libstdc++/66414
* include/bits/basic_string.tcc
(basic_string::find(const CharT*, size_type, size_type)): Optimize.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244225 138bc75d-0d04-0410-961f-82ee72b054a4
PS: Using std::find
will always be slower than the current std::string::find
with the current implementation.