Vector: initialization or reserve?

The "best" way would be:

vector<int> vec = {var1, var2, var3};

available with a C++11 capable compiler.

Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.

Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.

In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.

* From Stroustrups C++ Style and Technique FAQ:

People sometimes worry about the cost of std::vector growing incrementally. I used to worry about that and used reserve() to optimize the growth. After measuring my code and repeatedly having trouble finding the performance benefits of reserve() in real programs, I stopped using it except where it is needed to avoid iterator invalidation (a rare case in my code). Again: measure before you optimize.

Both variants have different semantics, i.e. you are comparing apples and oranges.

The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.

Choose what better fits your needs, i.e. what is "better" in a certain situation.

Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.

For a simple rule to follow:

Option #1 is faster... enter image description here

...but this probably shouldn't be your biggest concern.

Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3): enter image description here

So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.

These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.

main.cpp:

/* 
 * Test constructing and filling a vector in two ways: construction with size
 * then assignment versus construction of empty vector followed by push_back
 * We collect dummy sums to prevent the compiler from optimizing out computation
 */

#include <iostream>
#include <vector>

#include "rng.hpp"
#include "timer.hpp"

const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;

int main() {
  for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
       mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
    // Generate sizes from normal distribution
    std::vector<size_t> sizes_vec;
    NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0); 
    for (int i = 0; i < kNumVecs; ++i) {
      sizes_vec.push_back(sizes_rng.GenerateValue());
    }
    Timer timer;
    UniformIntRng<int> values_rng(0, 5);
    // Method 1: construct with size, then assign
    timer.Reset();
    int method_1_sum = 0;
    for (size_t num_els : sizes_vec) {
      std::vector<int> vec(num_els);
      for (size_t i = 0; i < num_els; ++i) {
        vec[i] = values_rng.GenerateValue();
      }
      // Compute sum - this part identical for two methods
      for (size_t i = 0; i < num_els; ++i) {
        method_1_sum += vec[i];
      }
    }
    double method_1_seconds = timer.GetSeconds();
    // Method 2: reserve then push_back
    timer.Reset();
    int method_2_sum = 0;
    for (size_t num_els : sizes_vec) {
      std::vector<int> vec;
      vec.reserve(num_els);
      for (size_t i = 0; i < num_els; ++i) {
        vec.push_back(values_rng.GenerateValue());
      }
      // Compute sum - this part identical for two methods
      for (size_t i = 0; i < num_els; ++i) {
        method_2_sum += vec[i];
      }
    }
    double method_2_seconds = timer.GetSeconds();
    // Report results as mean_size, method_1_seconds, method_2_seconds
    std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
    // Do something with the dummy sums that cannot be optimized out
    std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
  }

  return 0;
}

The header files I used are located here:

rng.hpp
timer.hpp

Vector: initialization or reserve?

Tags:

C++

Vector

Related

Recent Posts