Why is the size of 2⁶³ 36 bytes, but 2⁶³-1 is only 24 bytes?
why does it get 12 more bytes for 2⁶³ compared too 2⁶³ - 1 and not just one?
On an LP64 system1, a Python 2 int
consists of exactly three pointer-sized pieces:
- type pointer
- reference count
- actual value, a C
long int
That's 24 bytes in total. On the other hand, a Python long
consists of:
- type pointer
- reference count
- digit count, a pointer-sized integer
- inline array of value digits, each holding 30 bits of value, but stored in 32-bit units (one of the unused bits gets used for efficient carry/borrow during addition and subtraction)
2**63 requires 64 bits to store, so it fits in three 30-bit digits. Since each digit is 4 bytes wide, the whole Python long
will take 24+3*4 = 36 bytes.
In other words, the difference comes from long
having to separately store the size of the number (8 additional bytes) and from it being slightly less space-efficient about storing the value (12 bytes to store the digits of 2**63). Including the size, the value 2**63 in a long
occupies 20 bytes. Comparing that to the 8 bytes occupied by any value of the simple int
yields the observed 12-byte difference.
It is worth noting that Python 3 only has one integer type, called int
, which is variable-width, and implemented the same way as Python 2 long
.
1 64-bit Windows differs in that it retains a 32-bit
long int
, presumably for source compatibility with a large body of older code that used char
, short
, and long
as "convenient" aliases for 8, 16, and 32-bit values that happened to work on both 16 and 32-bit systems. To get an actual 64-bit type on x86-64 Windows, one must use __int64
or (on newer compiler versions) long long
or int64_t
. Since Python 2 internally depends on Python int
fitting into a C long in various places, sys.maxint
remains 2**31-1
, even on 64-bit Windows. This quirk is also fixed in Python 3, which has no concept of maxint.
While I didn't find it in the documentation, here is my explanation.
Python 2 promotes int
to long
implicitly, when the value exceeds the value that can be stored in int. The size of the new type (long
) is the default size of long
, which is 32. From now on, the size of your variable, will be determined by its value, which can go up and down.
from sys import getsizeof as size
a = 1
n = 32
# going up
for i in range(10):
if not i:
print 'a = %100s%13s%4s' % (str(a), type(a), size(a))
else:
print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
a <<= n
# going down
for i in range(11):
print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
a >>= n
a = 1 <type 'int'> 24
a = 4294967296 <type 'long'> 32
a = 18446744073709551616 <type 'long'> 36
a = 79228162514264337593543950336 <type 'long'> 40
a = 340282366920938463463374607431768211456 <type 'long'> 44
a = 1461501637330902918203684832716283019655932542976 <type 'long'> 48
a = 6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a = 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a = 115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a = 497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a = 2135987035920910082395021706169552114602704522356652769947041607822219725780640550022962086936576 <type 'long'> 68
a = 497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a = 115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a = 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a = 6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a = 1461501637330902918203684832716283019655932542976 <type 'long'> 48
a = 340282366920938463463374607431768211456 <type 'long'> 44
a = 79228162514264337593543950336 <type 'long'> 40
a = 18446744073709551616 <type 'long'> 36
a = 4294967296 <type 'long'> 32
a = 1 <type 'long'> 28
As you can see, the type stays long
after it first became too big for an int
, and the initial size was 32, but the size changes with the value (can be higher or lower [or equal, obviously] to 32)
So, to answer your question, the base size is 24 for int
, and 28 for long
, while long
has also the space for saving large values (which starts as 4 bytes - hence 32 bytes for long
, but can go up and down according to the value)
As for your sub-question, creating a unique type (with a unique size) for a new number is impossible, so Python has "sub classes" of long
type, which deal with a range of numbers, therefore, once you over the limit of your old long
you must use the newer, which accounts for much larger numbers too, therefore, it has a few bytes more.