sys.stdin.readline() and input(): which one is faster when reading lines of input, and why?
The builtin input
and sys.stdin.readline
functions don't do exactly the same thing, and which one is faster may depend on the details of exactly what you're doing. As aruisdante commented, the difference is less in Python 3 than it was in Python 2, when the quote you provide was from, but there are still some differences.
The first difference is that input
has an optional prompt parameter that will be displayed if the interpreter is running interactively. This leads to some overhead, even if the prompt is empty (the default). On the other hand, it may be faster than doing a print
before each readline
call, if you do want a prompt.
The next difference is that input
strips off any newline from the end of the input. If you're going to strip that anyway, it may be faster to let input
do it for you, rather than doing sys.stdin.readline().strip()
.
A final difference is how the end of the input is indicated. input
will raise an EOFError
when you call it if there is no more input (stdin has been closed on the other end). sys.stdin.readline
on the other hand will return an empty string at EOF, which you need to know to check for.
There's also a third option, using the file iteration protocol on sys.stdin
. This is likely to be much like calling readline
, but perhaps nicer logic to it.
I suspect that while differences in performance between your various options may exist, they're liky to be smaller than the time cost of simply reading the file from the disk (if it is large) and doing whatever you are doing with it. I suggest that you avoid the trap of premature optimization and just do what is most natural for your problem, and if the program is too slow (where "too slow" is very subjective), you do some profiling to see what is taking the most time. Don't put a whole lot of effort into deciding between the different ways of taking input unless it actually matters.
It checks if it is TTY every time as input() runs by syscall and it works much more slow than sys.stdin.readline() https://github.com/python/cpython/blob/af2f5b1723b95e45e1f15b5bd52102b7de560f7c/Python/bltinmodule.c#L1981
As Linn1024 says, for reading large amounts of data input()
is much slower.
A simple example is this:
import sys
for i in range(int(sys.argv[1])):
sys.stdin.readline()
This takes about 0.25μs
per iteration:
$ time yes | py readline.py 1000000
yes 0.05s user 0.00s system 22% cpu 0.252 total
Changing that to sys.stdin.readline().strip()
takes that to about 0.31μs
.
Changing readline()
to input()
is about 10 times slower:
$ time yes | py input.py 1000000
yes 0.05s user 0.00s system 1% cpu 2.855 total
Notice that it's still pretty fast though, so you only really need to worry when you are reading thousands of entries like above.