how to pre-allocate memory for a std::string object?

std::string has a .reserve method for pre-allocation.

std::string s;
s.reserve(1048576); // reserve 1 MB
read_file_into(s);

This isn't so much an answer in itself, as a kind of a comment on/summary/comparison of a couple of other answers (as well as a quick demonstration of why I've recommended the style of code @Johannes - litb gives in his answer). Since @sbi posted an alternative that looked pretty good, and (especially) avoided the extra copy involved in reading into a stringstream, then using the .str() member to get a string, I decided to write up a quick comparison of the two:

[ Edit: I've added a third test case using @Tyler McHenry's istreambuf_iterator-based code, and added a line to print out the length of each string that was read to ensure that the optimizer didn't optimize away the reading because the result was never used.]

[ Edit2: And now, code from Martin York has been added as well...]

#include <fstream>
#include <sstream>
#include <string>
#include <iostream>
#include <iterator>
#include <time.h>

int main() {
    std::ostringstream os;
    std::ifstream file("equivs2.txt");

    clock_t start1 = clock();
    os << file.rdbuf();
    std::string s = os.str();
    clock_t stop1 = clock();

    std::cout << "\ns.length() = " << s.length();

    std::string s2;

    clock_t start2 = clock();
    file.seekg( 0, std::ios_base::end );
    const std::streampos pos = file.tellg();
    file.seekg(0, std::ios_base::beg);

    if( pos!=std::streampos(-1) )
        s2.reserve(static_cast<std::string::size_type>(pos));
    s2.assign(std::istream_iterator<char>(file), std::istream_iterator<char>());
    clock_t stop2 = clock();

    std::cout << "\ns2.length = " << s2.length();

    file.clear();

    std::string s3;

    clock_t start3 = clock();   
    file.seekg(0, std::ios::end);   
    s3.reserve(file.tellg());
    file.seekg(0, std::ios::beg);

    s3.assign((std::istreambuf_iterator<char>(file)),
            std::istreambuf_iterator<char>());
    clock_t stop3 = clock();

    std::cout << "\ns3.length = " << s3.length();

    // New Test
    std::string s4;

    clock_t start4 = clock();
    file.seekg(0, std::ios::end);
    s4.resize(file.tellg());
    file.seekg(0, std::ios::beg);

    file.read(&s4[0], s4.length());
    clock_t stop4 = clock();

    std::cout << "\ns4.length = " << s3.length();

    std::cout << "\nTime using rdbuf: " << stop1 - start1;
    std::cout << "\nTime using istream_iterator: " << stop2- start2;
    std::cout << "\nTime using istreambuf_iterator: " << stop3 - start3;
    std::cout << "\nTime using read: " << stop4 - start4;
    return 0;
}

Now the impressive part -- the results. First with VC++ (in case somebody cares, Martin's code is fast enough I increased the file size to get a meaningful time for it):

s.length() = 7669436
s2.length = 6390688
s3.length = 7669436
s4.length = 7669436
Time using rdbuf: 184
Time using istream_iterator: 1332
Time using istreambuf_iterator: 249
Time using read: 48

Then with gcc (cygwin):

s.length() = 8278035
s2.length = 6390689
s3.length = 8278035
s4.length = 8278035
Time using rdbuf: 62
Time using istream_iterator: 2199
Time using istreambuf_iterator: 156
Time using read: 16

[ end of edit -- the conclusions remain, though the winner has changed -- Martin's code is clearly the fastest. ]

The results are quite consistent with respect to which is fastest and slowest. The only inconsistency is with how much faster or slower one is than another. Though the placements are the same, the speed differences are much larger with gcc than with VC++.


This should be all you need:

ostringstream os;
ifstream file("name.txt");
os << file.rdbuf();

string s = os.str();

This reads characters from file and inserts them into the stringstream. Afterwards it gets the string created behind the scenes. Notice that i fell into the following trap: Using the extraction operator will skip initial whitespace. You have to use the insertion operator like above, or use the noskipws manipulator:

// Beware, skips initial whitespace!
file >> os.rdbuf();

// This does not skip it
file >> noskipws >> os.rdbuf(); 

These functions are described as reading the stream character by character though (not sure what optimizations are possible here, though), i haven't timed these to determine their speed.

Tags:

C++

String