Efficient way to filter out elements from std::vector

Yes you can use std::remove_copy_if, e.g.

std::remove_copy_if(
  all_items.begin(), 
  all_items.end(), 
  std::back_inserter(filter_items),
  [&bad_ids](const mystruct& item) { return std::find(bad_ids.begin(), bad_ids.end(), item.id) != bad_ids.end(); });

LIVE

Or you can use std::remove_if and erase the bad elements on the vector directly, e.g.

all_items.erase(
  std::remove_if(
    all_items.begin(), 
    all_items.end(), 
    [&bad_ids](const mystruct& item) { return std::find(bad_ids.begin(), bad_ids.end(), item.id) != bad_ids.end(); }), 
  all_items.end());

LIVE


expanding on @songyuanyao's correct answer, it never hurts to keep a little library of container helpers to make code more expressive.

#include <iostream>
#include <vector>
#include <algorithm>

struct mystruct {
    int id;
    std::string name;
};

template<class T, class A, class Pred>
std::vector<T, A> copy_unless(std::vector<T, A> container, Pred&& pred)
{
    container.erase(std::remove_if(container.begin(), container.end(), 
                                   std::forward<Pred>(pred)), 
                    container.end());
    return container;
}

template<class Container, class Pred>
bool any_match(Container&& container, Pred&& pred)
{
    return std::find_if(container.begin(), container.end(), pred) != container.end();
}

int main()
{        
    std::vector<mystruct> all_items = {{151, "test1"}, {154, "test4"}, {152, "test2"}, {151, "test1"}, {151, "test1"}, {153, "test3"}};
    std::vector<int> bad_ids = {151, 152};

    auto is_bad = [&bad_ids](mystruct const& item)
    {
        auto match_id = [&item](int id){ return item.id == id; };
        return any_match(bad_ids, match_id);
    };

    auto filter_items = copy_unless(all_items, is_bad);

    for (auto&& f : filter_items) {
        std::cout << "Good item: " << f.id << std::endl;
    }
}

I'm sure I remember a library like this in boost, but for the life of me I can't remember which one it is.


I'd suggest Boost Range:

Live On Coliru

int main() {
    myvec all_items = { { 151, "test1" }, { 154, "test4" }, { 152, "test2" },
                        { 151, "test1" }, { 151, "test1" }, { 153, "test3" } };

    auto is_good = [bad_ids = std::set<int> { 151, 152 }](mystruct v) {
        return bad_ids.end() == bad_ids.find(v.id); 
    };

    // just filter on the fly:
    for (auto& f : all_items | filtered(is_good)) {
        std::cout << "Good item: " << f.id << std::endl;
    }

    // actually copy:
    auto filter_items = boost::copy_range<myvec>(all_items | filtered(is_good));
}

Prints

Good item: 154
Good item: 153

Improving...

You could improve style by factoring things out a little:

Assuming you have a utility like contains:

template <typename... Arg, typename V> bool contains(std::set<Arg...> const &set, V const &v) {
    return set.end() != set.find(v);
}

template <typename... Arg, typename V> bool contains(std::vector<Arg...> const &vec, V const &v) {
    return vec.end() != std::find(vec.begin(), vec.end(), v);
}

Then it becomes more readable:

Live On Coliru

auto is_good = [&bad_ids](auto& v) { return !contains(bad_ids, v.id); };

for (auto& f : all_items | filtered(is_good)) {
    std::cout << "Good item: " << f.id << std::endl;
}

Now, I feel like the whole bad_ids list could probably also be dynamic. But if it weren't, you could be more "in-place" using Phoenix:

Peak Hipster:

Live On Coliru

for (auto& f : all_items | filtered(!contains_(std::set<int> { 151, 152 }, arg1->*&mystruct::id))) {
    std::cout << "Good item: " << f.id << std::endl;
}

I know. That's pushing it for no good reason, but hey. Just showing :)