convert io.StringIO to io.BytesIO

As some pointed out, you need to do the encoding/decoding yourself.

However, you can achieve this in an elegant way - implementing your own TextIOWrapper for string => bytes.

Here is such a sample:

class BytesIOWrapper:
    def __init__(self, string_buffer, encoding='utf-8'):
        self.string_buffer = string_buffer
        self.encoding = encoding

    def __getattr__(self, attr):
        return getattr(self.string_buffer, attr)

    def read(self, size=-1):
        content = self.string_buffer.read(size)
        return content.encode(self.encoding)

    def write(self, b):
        content = b.decode(self.encoding)
        return self.string_buffer.write(content)

Which produces an output like this:

Click to copy

In [36]: bw = BytesIOWrapper(StringIO("some lengt˙˚hyÔstring in here"))

In [37]: bw.read(15)
Out[37]: b'some lengt\xcb\x99\xcb\x9ahy\xc3\x94'

In [38]: bw.tell()
Out[38]: 15

In [39]: bw.write(b'ME')
Out[39]: 2

In [40]: bw.seek(15)
Out[40]: 15

In [41]: bw.read()
Out[41]: b'MEring in here'

Hope it clears your thoughts!

It's interesting that though the question might seem reasonable, it's not that easy to figure out a practical reason why I would need to convert a StringIO into a BytesIO. Both are basically buffers and you usually need only one of them to make some additional manipulations either with the bytes or with the text.

I may be wrong, but I think your question is actually how to use a BytesIO instance when some code to which you want to pass it expects a text file.

In which case, it is a common question and the solution is codecs module.

The two usual cases of using it are the following:

Compose a File Object to Read

Click to copy

In [16]: import codecs, io

In [17]: bio = io.BytesIO(b'qwe\nasd\n')

In [18]: StreamReader = codecs.getreader('utf-8')  # here you pass the encoding

In [19]: wrapper_file = StreamReader(bio)

In [20]: print(repr(wrapper_file.readline()))
'qwe\n'

In [21]: print(repr(wrapper_file.read()))
'asd\n'

In [26]: bio.seek(0)
Out[26]: 0

In [27]: for line in wrapper_file:
    ...:     print(repr(line))
    ...:
'qwe\n'
'asd\n'

Compose a File Object to Write To

Click to copy

In [28]: bio = io.BytesIO()

In [29]: StreamWriter = codecs.getwriter('utf-8')  # here you pass the encoding

In [30]: wrapper_file = StreamWriter(bio)

In [31]: print('жаба', 'цап', file=wrapper_file)

In [32]: bio.getvalue()
Out[32]: b'\xd0\xb6\xd0\xb0\xd0\xb1\xd0\xb0 \xd1\x86\xd0\xb0\xd0\xbf\n'

In [33]: repr(bio.getvalue().decode('utf-8'))
Out[33]: "'жаба цап\\n'"

@foobarna answer can be improved by inheriting some io base-class

Click to copy

import io
sio = io.StringIO('wello horld')


class BytesIOWrapper(io.BufferedReader):
    """Wrap a buffered bytes stream over TextIOBase string stream."""

    def __init__(self, text_io_buffer, encoding=None, errors=None, **kwargs):
        super(BytesIOWrapper, self).__init__(text_io_buffer, **kwargs)
        self.encoding = encoding or text_io_buffer.encoding or 'utf-8'
        self.errors = errors or text_io_buffer.errors or 'strict'

    def _encoding_call(self, method_name, *args, **kwargs):
        raw_method = getattr(self.raw, method_name)
        val = raw_method(*args, **kwargs)
        return val.encode(self.encoding, errors=self.errors)

    def read(self, size=-1):
        return self._encoding_call('read', size)

    def read1(self, size=-1):
        return self._encoding_call('read1', size)

    def peek(self, size=-1):
        return self._encoding_call('peek', size)


bio = BytesIOWrapper(sio)
print(bio.read())  # b'wello horld'

It could be a generally useful tool to convert a character stream into a byte stream, so here goes:

Click to copy

import io

class EncodeIO(io.BufferedIOBase):
  def __init__(self,s,e='utf-8'):
    self.stream=s               # not raw, since it isn't
    self.encoding=e
    self.buf=b""                # encoded but not yet returned
  def _read(self,s): return self.stream.read(s).encode(self.encoding)
  def read(self,size=-1):
    b=self.buf
    self.buf=b""
    if size is None or size<0: return b+self._read(None)
    ret=[]
    while True:
      n=len(b)
      if size<n:
        b,self.buf=b[:size],b[size:]
        n=size
      ret.append(b)
      size-=n
      if not size: break
      b=self._read(min((size+1024)//2,size))
      if not b: break
    return b"".join(ret)
  read1=read

Obviously write could be defined symmetrically to decode input and send it to the underlying stream, although then you have to deal with having enough bytes for only part of a character.

convert io.StringIO to io.BytesIO

Compose a File Object to Read

Compose a File Object to Write To

Tags:

Python

Io

Encoding

Python 3.X

Stream

Related

Recent Posts