Why is istream/ostream slow
There are several reasons why [i]ostreams are slow by design:
Shared formatting state: every formatted output operation has to check all formatting state that might have been previously mutated by I/O manipulators. For this reason iostreams are inherently slower than
printf
-like APIs (especially with format string compilation like in Rust or {fmt} that avoid parsing overhead) where all formatting information is local.Uncontrolled use of locales: all formatting goes through an inefficient locale layer even if you don't want this, for example when writing a JSON file. See N4412: Shortcomings of iostreams.
Inefficient codegen: formatting a message with iostreams normally consists of multiple function calls because arguments and I/O manipulators are interleaved with parts of the message. For example, there are three function calls (godbolt) in
std::cout << "The answer is " << answer << ".\n";
compared to just one (godbolt) in the equivalent
printf
call:printf("The answer is %d.\n", answer);
Extra buffering and synchronization. This can be disabled with
sync_with_stdio(false)
at the cost of poor interoperability with other I/O facilities.
Actually, IOStreams don't have to be slow! It is a matter of implementing them in a reasonable way to make them fast, though. Most standard C++ library don't seem to pay too much attention to implement IOStreams. A long time ago when my CXXRT was still maintained it was about as fast as stdio - when used correctly!
Note that there are few performance traps for users laid out with IOStreams, however. The following guidelines apply to all IOStream implementations but especially to those which are tailored to be fast:
- When using
std::cin
,std::cout
, etc. you need to callstd::sync_with_stdio(false)
! Without this call, any use of the standard stream objects is required to synchronize with C's standard streams. Of course, when usingstd::sync_with_stdio(false)
it is assumed that you don't mixstd::cin
withstdin
,std::cout
withstdout
, etc. - Do not use
std::endl
as it mandates many unnecessary flushes of any buffer. Likewise, don't setstd::ios_base::unitbuf
or usestd::flush
unnecessarily. - When creating your own stream buffers (OK, few users do), make sure they do use an internal buffer! Processing individual characters jumps through multiple conditions and a
virtual
function which makes it hideously slow.
Perhaps this can give some idea of what you're dealing with:
#include <stdio.h>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <fstream>
#include <time.h>
#include <string>
#include <algorithm>
unsigned count1(FILE *infile, char c) {
int ch;
unsigned count = 0;
while (EOF != (ch=getc(infile)))
if (ch == c)
++count;
return count;
}
unsigned int count2(FILE *infile, char c) {
static char buffer[8192];
int size;
unsigned int count = 0;
while (0 < (size = fread(buffer, 1, sizeof(buffer), infile)))
for (int i=0; i<size; i++)
if (buffer[i] == c)
++count;
return count;
}
unsigned count3(std::istream &infile, char c) {
return std::count(std::istreambuf_iterator<char>(infile),
std::istreambuf_iterator<char>(), c);
}
unsigned count4(std::istream &infile, char c) {
return std::count(std::istream_iterator<char>(infile),
std::istream_iterator<char>(), c);
}
unsigned int count5(std::istream &infile, char c) {
static char buffer[8192];
unsigned int count = 0;
while (infile.read(buffer, sizeof(buffer)))
count += std::count(buffer, buffer+infile.gcount(), c);
count += std::count(buffer, buffer+infile.gcount(), c);
return count;
}
unsigned count6(std::istream &infile, char c) {
unsigned int count = 0;
char ch;
while (infile >> ch)
if (ch == c)
++count;
return count;
}
template <class F, class T>
void timer(F f, T &t, std::string const &title) {
unsigned count;
clock_t start = clock();
count = f(t, 'N');
clock_t stop = clock();
std::cout << std::left << std::setw(30) << title << "\tCount: " << count;
std::cout << "\tTime: " << double(stop-start)/CLOCKS_PER_SEC << "\n";
}
int main() {
char const *name = "equivs2.txt";
FILE *infile=fopen(name, "r");
timer(count1, infile, "ignore");
rewind(infile);
timer(count1, infile, "using getc");
rewind(infile);
timer(count2, infile, "using fread");
fclose(infile);
std::ifstream in2(name);
timer(count3, in2, "ignore");
in2.clear();
in2.seekg(0);
timer(count3, in2, "using streambuf iterators");
in2.clear();
in2.seekg(0);
timer(count4, in2, "using stream iterators");
in2.clear();
in2.seekg(0);
timer(count5, in2, "using istream::read");
in2.clear();
in2.seekg(0);
timer(count6, in2, "using operator>>");
return 0;
}
Running this, I get results like this (with MS VC++):
ignore Count: 1300 Time: 0.309
using getc Count: 1300 Time: 0.308
using fread Count: 1300 Time: 0.028
ignore Count: 1300 Time: 0.091
using streambuf iterators Count: 1300 Time: 0.091
using stream iterators Count: 1300 Time: 0.613
using istream::read Count: 1300 Time: 0.028
using operator>> Count: 1300 Time: 0.619
and this (with MinGW):
ignore Count: 1300 Time: 0.052
using getc Count: 1300 Time: 0.044
using fread Count: 1300 Time: 0.036
ignore Count: 1300 Time: 0.068
using streambuf iterators Count: 1300 Time: 0.068
using stream iterators Count: 1300 Time: 0.131
using istream::read Count: 1300 Time: 0.037
using operator>> Count: 1300 Time: 0.121
As we can see in the results, it's not really a matter of iostreams being categorically slow. Rather, a great deal depends on exactly how you use iostreams (and to a lesser extent FILE *
as well). There's also a pretty substantial variation just between these to implementations.
Nonetheless, the fastest versions with each (fread
and istream::read
) are essentially tied. With VC++ getc
is quite a bit slower than either istream::read
or and istreambuf_iterator
.
Bottom line: getting good performance from iostreams requires a little more care than with FILE *
-- but it's certainly possible. They also give you more options: convenience when you don't care all that much about speed, and performance directly competitive with the best you can get from C-style I/O, with a little extra work.
While this question is quite old, I'm amazed nobody has mentioned iostream object construction.
That is, whenever you create an STL iostream
(and other stream variants), if you step into the code, the constructor calls an internal Init
function. In there, operator new
is called to create a new locale
object.
And likewise, is destroyed upon destruction.
This is hideous, IMHO. And certainly contributes to slow object construction/destruction, because memory is being allocated/deallocated using a system lock, at some point.
Further, some of the STL streams allow you to specify an allocator
, so why is the locale
created NOT using the specified allocator?
Using streams in a multithreaded environment, you could also imagine the bottleneck imposed by calling operator new
every time a new stream object is constructed.
Hideous mess if you ask me, as I am finding out myself right now!