Why is the size of 2⁶³ 36 bytes, but 2⁶³-1 is only 24 bytes?

why does it get 12 more bytes for 2⁶³ compared too 2⁶³ - 1 and not just one?

On an LP64 system¹, a Python 2 int consists of exactly three pointer-sized pieces:

type pointer
reference count
actual value, a C long int

That's 24 bytes in total. On the other hand, a Python long consists of:

type pointer
reference count
digit count, a pointer-sized integer
inline array of value digits, each holding 30 bits of value, but stored in 32-bit units (one of the unused bits gets used for efficient carry/borrow during addition and subtraction)

2**63 requires 64 bits to store, so it fits in three 30-bit digits. Since each digit is 4 bytes wide, the whole Python long will take 24+3*4 = 36 bytes.

In other words, the difference comes from long having to separately store the size of the number (8 additional bytes) and from it being slightly less space-efficient about storing the value (12 bytes to store the digits of 2**63). Including the size, the value 2**63 in a long occupies 20 bytes. Comparing that to the 8 bytes occupied by any value of the simple int yields the observed 12-byte difference.

It is worth noting that Python 3 only has one integer type, called int, which is variable-width, and implemented the same way as Python 2 long.

¹ 64-bit Windows differs in that it retains a 32-bit long int, presumably for source compatibility with a large body of older code that used char, short, and long as "convenient" aliases for 8, 16, and 32-bit values that happened to work on both 16 and 32-bit systems. To get an actual 64-bit type on x86-64 Windows, one must use __int64 or (on newer compiler versions) long long or int64_t. Since Python 2 internally depends on Python int fitting into a C long in various places, sys.maxint remains 2**31-1, even on 64-bit Windows. This quirk is also fixed in Python 3, which has no concept of maxint.

While I didn't find it in the documentation, here is my explanation.

Python 2 promotes int to long implicitly, when the value exceeds the value that can be stored in int. The size of the new type (long) is the default size of long, which is 32. From now on, the size of your variable, will be determined by its value, which can go up and down.

from sys import getsizeof as size
a = 1
n = 32

# going up
for i in range(10):
    if not i:
        print 'a = %100s%13s%4s' % (str(a), type(a), size(a))
    else:
        print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
    a <<= n

# going down
for i in range(11):
    print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
    a >>= n


a =                                                                                                    1 <type 'int'>  24
a =                                                                                           4294967296 <type 'long'> 32
a =                                                                                 18446744073709551616 <type 'long'> 36
a =                                                                        79228162514264337593543950336 <type 'long'> 40
a =                                                              340282366920938463463374607431768211456 <type 'long'> 44
a =                                                    1461501637330902918203684832716283019655932542976 <type 'long'> 48
a =                                           6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a =                                 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a =                       115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a =              497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a =    2135987035920910082395021706169552114602704522356652769947041607822219725780640550022962086936576 <type 'long'> 68
a =              497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a =                       115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a =                                 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a =                                           6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a =                                                    1461501637330902918203684832716283019655932542976 <type 'long'> 48
a =                                                              340282366920938463463374607431768211456 <type 'long'> 44
a =                                                                        79228162514264337593543950336 <type 'long'> 40
a =                                                                                 18446744073709551616 <type 'long'> 36
a =                                                                                           4294967296 <type 'long'> 32
a =                                                                                                    1 <type 'long'> 28

As you can see, the type stays long after it first became too big for an int, and the initial size was 32, but the size changes with the value (can be higher or lower [or equal, obviously] to 32)

So, to answer your question, the base size is 24 for int, and 28 for long, while long has also the space for saving large values (which starts as 4 bytes - hence 32 bytes for long, but can go up and down according to the value)

As for your sub-question, creating a unique type (with a unique size) for a new number is impossible, so Python has "sub classes" of long type, which deal with a range of numbers, therefore, once you over the limit of your old long you must use the newer, which accounts for much larger numbers too, therefore, it has a few bytes more.

Why is the size of 2⁶³ 36 bytes, but 2⁶³-1 is only 24 bytes?

Tags:

Python

Python Internals

Python 2.7

Cpython

Related

Recent Posts