python 3: reading bytes from stdin pipe with readahead

The exception doesn't come from Python, but from the operating system, which doesn't allow seeking on pipes. (If you redirect output from a regular pipe, it can be seeked, even though it's standard input.) This is why you get the error in one case and not in the other, even though the classes are the same.

The classic Python 2 solution for readahead would be to wrap the stream in your own stream implementation that implements readahead:

class Peeker(object):
    def __init__(self, fileobj):
        self.fileobj = fileobj
        self.buf = cStringIO.StringIO()

    def _append_to_buf(self, contents):
        oldpos = self.buf.tell()
        self.buf.seek(0, os.SEEK_END)
        self.buf.write(contents)
        self.buf.seek(oldpos)

    def peek(self, size):
        contents = self.fileobj.read(size)
        self._append_to_buf(contents)
        return contents

    def read(self, size=None):
        if size is None:
            return self.buf.read() + self.fileobj.read()
        contents = self.buf.read(size)
        if len(contents) < size:
            contents += self.fileobj.read(size - len(contents))
        return contents

    def readline(self):
        line = self.buf.readline()
        if not line.endswith('\n'):
            line += self.fileobj.readline()
        return line

sys.stdin = Peeker(sys.stdin)

In Python 3 supporting the full sys.stdin while peeking the undecoded stream is complicated—one would wrap stdin.buffer as shown above, then instantiate a new TextIOWrapper over your peekable stream, and install that TextIOWrapper as sys.stdin.

However, since you only need to peek at sys.stdin.buffer, the above code will work just fine, after changing cStringIO.StringIO to io.BytesIO and '\n' to b'\n'.

user4815162342's solution, while extremely useful, appears to have an issue in that it differs from the current behaviour of the io.BufferedReader peek method.

The builtin method will return the same data (starting from the current read position) for sequential peek() calls.

user4815162342's solution will return sequential chunks of data for each sequential peek call. This implies the user must wrap peek again to concatenate the output if they wish to use the same data more than once.

Here is the fix to return builtin behaviour:

def _buffered(self):
    oldpos = self.buf.tell()
    data = self.buf.read()
    self.buf.seek(oldpos)
    return data

def peek(self, size):
    buf = self._buffered()[:size]
    if len(buf) < size:
        contents = self.fileobj.read(size - len(buf))
        self._append_to_buf(contents)
        return self._buffered()
    return buf

See the full version here

There are other optimisations that could be applied, e.g. removal of previously buffered data upon a read call that exhausts the buffer. The current implementation leaves any peeked data in the buffer, but that data is inaccessible.

python 3: reading bytes from stdin pipe with readahead

Tags:

Python

Python 3.X

Related

Recent Posts