python 3: reading bytes from stdin pipe with readahead
The exception doesn't come from Python, but from the operating system, which doesn't allow seeking on pipes. (If you redirect output from a regular pipe, it can be seeked, even though it's standard input.) This is why you get the error in one case and not in the other, even though the classes are the same.
The classic Python 2 solution for readahead would be to wrap the stream in your own stream implementation that implements readahead:
class Peeker(object):
def __init__(self, fileobj):
self.fileobj = fileobj
self.buf = cStringIO.StringIO()
def _append_to_buf(self, contents):
oldpos = self.buf.tell()
self.buf.seek(0, os.SEEK_END)
self.buf.write(contents)
self.buf.seek(oldpos)
def peek(self, size):
contents = self.fileobj.read(size)
self._append_to_buf(contents)
return contents
def read(self, size=None):
if size is None:
return self.buf.read() + self.fileobj.read()
contents = self.buf.read(size)
if len(contents) < size:
contents += self.fileobj.read(size - len(contents))
return contents
def readline(self):
line = self.buf.readline()
if not line.endswith('\n'):
line += self.fileobj.readline()
return line
sys.stdin = Peeker(sys.stdin)
In Python 3 supporting the full sys.stdin
while peeking the undecoded stream is complicated—one would wrap stdin.buffer
as shown above, then instantiate a new TextIOWrapper
over your peekable stream, and install that TextIOWrapper
as sys.stdin
.
However, since you only need to peek at sys.stdin.buffer
, the above code will work just fine, after changing cStringIO.StringIO
to io.BytesIO
and '\n'
to b'\n'
.
user4815162342's solution, while extremely useful, appears to have an issue in that it differs from the current behaviour of the io.BufferedReader peek method.
The builtin method will return the same data (starting from the current read position) for sequential peek() calls.
user4815162342's solution will return sequential chunks of data for each sequential peek call. This implies the user must wrap peek again to concatenate the output if they wish to use the same data more than once.
Here is the fix to return builtin behaviour:
def _buffered(self):
oldpos = self.buf.tell()
data = self.buf.read()
self.buf.seek(oldpos)
return data
def peek(self, size):
buf = self._buffered()[:size]
if len(buf) < size:
contents = self.fileobj.read(size - len(buf))
self._append_to_buf(contents)
return self._buffered()
return buf
See the full version here
There are other optimisations that could be applied, e.g. removal of previously buffered data upon a read call that exhausts the buffer. The current implementation leaves any peeked data in the buffer, but that data is inaccessible.