bytes vs bytearray in Python 2.6 and 3

For (at least) Python 3.7

According to the docs:

bytes objects are immutable sequences of single bytes

bytearray objects are a mutable counterpart to bytes objects.

And that's pretty much it as far as bytes vs bytearray. In fact, they're fairly interchangeable and designed to flexible enough to be mixed in operations without throwing errors. In fact, there is a whole section in the official documentation dedicated to showing the similarities between the bytes and bytearray apis.

Some clues as to why from the docs:

Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.


TL;DR

python2.6+ bytes = python2.6+ str = python3.x bytes != python3.x str

python2.6+ bytearray = python3.x bytearray

python2.x unicode = python3.x str

Long Answer

bytes and str have changed meaning in python since python 3.x.

First to answer your question shortly, in python 2.6 bytes(b"hi") is an immutable array of bytes (8-bits or octets). So the type of each byte is simply byte, which is the same as str in python 2.6+ (However, this is not the case in python 3.x)

bytearray(b"hi") is again a mutable array of bytes. But when you ask its type, it's an int, because python represents each element of bytearray as an integer in range 0-255 (all possible values for an 8-bit integer). However, an element of bytes array is represented as an ASCII value of that byte.

For example, consider in Python 2.6+

>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0] # python shows you an int value for the 8 bits 0110 1000
104 
>>> bs[0] # python shows you an ASCII value for the 8 bits 0110 1000
'h'
>>> chr(barr[0]) # chr converts 104 to its corresponding ASCII value
'h'
>>> bs[0]==chr(barr[0]) # python compares ASCII value of 1st byte of bs and ASCII value of integer represented by first byte of barr
True

Now python 3.x is an entirely different story. As you might have suspected, it is weird why an str literal would mean a byte in python2.6+. Well this answer explains that

In Python 3.x, an str is a Unicode text (which was previously just an array of bytes, note that Unicode and bytes are two completely different things). bytearray is a mutable array of bytes while bytes is an immutable array of bytes. They both have almost the same functions. Now if I run the above same code again in python 3.x, here is the result. In Python 3.x

>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0]
104
>>> bs[0]
104
>>> bs[0]==barr[0] # bytes and bytearray are same thing in python 3.x
True

bytes and bytearray are the same things in python 3.x, except for there mutability.

What happened to str you might ask? str in python 3 got converted to what unicode was in python 2, and unicode type was subsequently removed from python 3 as it was redundant.

I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?

It depends on what you are trying to do. Are you dealing with bytes or are you dealing with ASCII representation of bytes?

If you are dealing with bytes, then my advice is to use bytearray in Python 2, which is the same in python 3. But you loose immutability, if that matter to you.

If you are dealing with ASCII or text, then represent your string as u'hi' in Python 2, which has the same meaning in python 3. 'u' has special meaning in Python 2, which instructs python 2 to treat a string literal as unicode type. 'u' in python 3 as no meaning, because all string literal in Python 3 are Unicode by default (which is confusingly called str type in python 3, and unicode type in python 2).


In Python 2.6 bytes is merely an alias for str.
This "pseudo type" was introduced to [partially] prepare programs [and programmers!] to be converted/compatible with Python 3.0 where there is a strict distinction of semantics and use for str (which are systematically unicode) and bytes (which are arrays of octets, for storing data, but not text)

Similarly the b prefix for string literals is ineffective in 2.6, but it is a useful marker in the program, which flags explicitly the intent of the programmer to have the string as a data string rather than a text string. This info can then be used by the 2to3 converter or similar utilities when the program is ported to Py3k.

You may want to check this SO Question for additional info.