Clarification for "it should be possible to change the value of 1" from the CPython documentation
It means that integers in Python are actual objects with a "value"-field to hold the integer's value. In Java, you could express Python's integers like so (leaving out a lot of details, of course):
class PyInteger {
private int value;
public PyInteger(int val) {
this.value = val;
}
public PyInteger __add__(PyInteger other) {
return new PyInteger(this.value + other.value);
}
}
In order to not have hunderts of Python integers with the same value around, it caches some integers, along the lines of:
PyInteger[] cache = {
new PyInteger(0),
new PyInteger(1),
new PyInteger(2),
...
}
However, what would happen if you did something like this (let's ignore that value
is private for a moment):
PyInteger one = cache[1]; // the PyInteger representing 1
one.value = 3;
Suddenly, every time you used 1
in your program, you would actually get back 3
, because the object representing 1
has an effective value of 3
.
Indeed, you can do that in Python! That is: it is possible to change the effective numeric value of an integer in Python. There is an answer in this reddit post. I copy it here for completeness, though (original credits go to Veedrac):
import ctypes
def deref(addr, typ):
return ctypes.cast(addr, ctypes.POINTER(typ))
deref(id(29), ctypes.c_int)[6] = 100
#>>>
29
#>>> 100
29 ** 0.5
#>>> 10.0
The Python specification itself does not say anything about how integers are to be stored or represented internally. It also does not say which integers should be cached, or that any should be cached at all. In short: there is nothing in the Python specifications defining what should happen if you do something silly like this ;-).
We could even go slightly further...
In reality, the field value
above is actually an array of integers, emulating an arbitrary large integer value (for a 64-bit integer, you just combine two 32-bit fields, etc). However, when integers start to get large and outgrow a standard 32-bit integer, caching is no longer a viable option. Even if you used a dictionary, comparing integer arrays for equality would be too much of an overhead with too little gain.
You can actually check this yourself by using is
to compare identities:
>>> 3 * 4 is 12
True
>>> 300 * 400 is 120000
False
>>> 300 * 400 == 120000
True
In a typical Python system, there is exactly one object representing the number 12
. 120000
, on the other hand, is hardly ever cached. So, above, 300 * 400
yields a new object representing 120000
, which is different from the object created for the number on the right hand side.
Why is this relevant? If you change the value of a small number like 1
or 29
, it will affect all calculations that use that number. You will most likely seriously break your system (until you restart). But if you change the value of a large integer, the effects will be minimal.
Changing the value of 12
to 13
means that 3 * 4
will yield 13
. Chaning the value of 120000
to 130000
has much less effect and 300 * 400
will still yield (a new) 120000
and not 130000
.
As soon as you take other Python implementations into the picture, things can get even harder to predict. MicroPython, for instance, does not have objects for small numbers, but emalutes them on the fly, and PyPy might well just optimise your changes away.
Bottomline: the exact behaviour of numbers that you tinker with is truly undefined, but depends on several factors and the exact implementation.
Answer to a question in the comments: What is the significance of 6
in Veedrac's code above?
All objects in Python share a common memory layout. The first field is a reference counter that tells you how many other objects are currently referring to this object. The second field is a reference to the object's class or type. Since integers do not have a fixed size, the third field is the size of the data part (you can find the relevant definitions here (general objects) and here (integers/longs)):
struct longObject {
native_int ref_counter; // offset: +0 / +0
PyObject* type; // offset: +1 / +2
native_int size; // offset: +2 / +4
unsigned short value[]; // offset: +3 / +6
}
On a 32-bit system, native_int
and PyObject*
both occupy 32 bits, whereas on a 64-bit system they occupy 64 bits, naturally. So, if we access the data as 32 bits (using ctypes.c_int
) on a 64-bit system, the actual value of the integer is to be found at offset +6
. If you change the type to ctypes.c_long
, on the other hand, the offset is +3
.
Because id(x)
in CPython returns the memory address of x
, you can actually check this yourself. Based on the above deref
function, let's do:
>>> deref(id(29), ctypes.c_long)[3]
29
>>> deref(id(29), ctypes.c_long)[1]
10277248
>>> id(int) # memory address of class "int"
10277248