Using `std::search` over `string::find`

Right now (27th April 2017), at least GCCs libstdc++ (which is also used by clang by default), implements std::string::find with a linear search and thus is much slower than using

std::string_view substr{"whatever"};
auto it = std::search(s.cbegin(), s.cend(),
                      std::boyer_moore_searcher(substr.begin(), substr.end())); 

The problem is that the Boyer-Moore searcher allocates memory for internal data structures, and thus can fail with a std::bad_alloc exception. However, std::string::find is marked noexcept, so using the already implemented Boyer-Moore searcher within std::string::find isn't straight-forward.


string::find uses linear search but it is several times faster than Boyer Moore for some cases (with the latest patch). I submitted a patch (first-element then memcomp) to both libstdc++ and libc++ which improved string::find significantly. You can try the recent gcc (7.1) and you will get the improved performance. You can also measure the performance with the simple benchmarking suite I wrote: https://github.com/hiraditya/std-benchmark

Especially for smaller strings, by the time Boyer Moore is busy constructing internal data structure, (sub) linear string::find will be done. Also for parsing HTML etc., where most of the searches are mismatches, string::find should be faster.

commit fc7ebc4b8d9ad7e2891b7f72152e8a2b7543cd65
Author: redi <redi@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 9 13:05:58 2017 +0000

    PR66414 optimize std::string::find

    2017-01-09  Jonathan Wakely  <[email protected]>
            Aditya Kumar  <[email protected]>

        PR libstdc++/66414
        * include/bits/basic_string.tcc
        (basic_string::find(const CharT*, size_type, size_type)): Optimize.

    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244225 138bc75d-0d04-0410-961f-82ee72b054a4

PS: Using std::find will always be slower than the current std::string::find with the current implementation.

Tags:

C++

String