Clarification for "it should be possible to change the value of 1" from the CPython documentation

It means that integers in Python are actual objects with a "value"-field to hold the integer's value. In Java, you could express Python's integers like so (leaving out a lot of details, of course):

class PyInteger {

    private int value;

    public PyInteger(int val) {
        this.value = val;
    }

    public PyInteger __add__(PyInteger other) {
        return new PyInteger(this.value + other.value);
    }
}

In order to not have hunderts of Python integers with the same value around, it caches some integers, along the lines of:

PyInteger[] cache = {
  new PyInteger(0),
  new PyInteger(1),
  new PyInteger(2),
  ...
}

However, what would happen if you did something like this (let's ignore that value is private for a moment):

PyInteger one = cache[1];  // the PyInteger representing 1
one.value = 3;

Suddenly, every time you used 1 in your program, you would actually get back 3, because the object representing 1 has an effective value of 3.

Indeed, you can do that in Python! That is: it is possible to change the effective numeric value of an integer in Python. There is an answer in this reddit post. I copy it here for completeness, though (original credits go to Veedrac):

import ctypes

def deref(addr, typ):
    return ctypes.cast(addr, ctypes.POINTER(typ))

deref(id(29), ctypes.c_int)[6] = 100
#>>> 

29
#>>> 100

29 ** 0.5
#>>> 10.0

The Python specification itself does not say anything about how integers are to be stored or represented internally. It also does not say which integers should be cached, or that any should be cached at all. In short: there is nothing in the Python specifications defining what should happen if you do something silly like this ;-).

We could even go slightly further...

In reality, the field value above is actually an array of integers, emulating an arbitrary large integer value (for a 64-bit integer, you just combine two 32-bit fields, etc). However, when integers start to get large and outgrow a standard 32-bit integer, caching is no longer a viable option. Even if you used a dictionary, comparing integer arrays for equality would be too much of an overhead with too little gain.

You can actually check this yourself by using is to compare identities:

>>> 3 * 4 is 12
True
>>> 300 * 400 is 120000
False
>>> 300 * 400 == 120000
True

In a typical Python system, there is exactly one object representing the number 12. 120000, on the other hand, is hardly ever cached. So, above, 300 * 400 yields a new object representing 120000, which is different from the object created for the number on the right hand side.

Why is this relevant? If you change the value of a small number like 1 or 29, it will affect all calculations that use that number. You will most likely seriously break your system (until you restart). But if you change the value of a large integer, the effects will be minimal.

Changing the value of 12 to 13 means that 3 * 4 will yield 13. Chaning the value of 120000 to 130000 has much less effect and 300 * 400 will still yield (a new) 120000 and not 130000.

As soon as you take other Python implementations into the picture, things can get even harder to predict. MicroPython, for instance, does not have objects for small numbers, but emalutes them on the fly, and PyPy might well just optimise your changes away.

Bottomline: the exact behaviour of numbers that you tinker with is truly undefined, but depends on several factors and the exact implementation.

Answer to a question in the comments: What is the significance of 6 in Veedrac's code above?

All objects in Python share a common memory layout. The first field is a reference counter that tells you how many other objects are currently referring to this object. The second field is a reference to the object's class or type. Since integers do not have a fixed size, the third field is the size of the data part (you can find the relevant definitions here (general objects) and here (integers/longs)):

struct longObject {
    native_int      ref_counter;  // offset: +0 / +0
    PyObject*       type;         // offset: +1 / +2
    native_int      size;         // offset: +2 / +4
    unsigned short  value[];      // offset: +3 / +6
}

On a 32-bit system, native_int and PyObject* both occupy 32 bits, whereas on a 64-bit system they occupy 64 bits, naturally. So, if we access the data as 32 bits (using ctypes.c_int) on a 64-bit system, the actual value of the integer is to be found at offset +6. If you change the type to ctypes.c_long, on the other hand, the offset is +3.

Because id(x) in CPython returns the memory address of x, you can actually check this yourself. Based on the above deref function, let's do:

>>> deref(id(29), ctypes.c_long)[3]
29
>>> deref(id(29), ctypes.c_long)[1]
10277248
>>> id(int)       # memory address of class "int"
10277248

Clarification for "it should be possible to change the value of 1" from the CPython documentation

Tags:

Python

Implementation

Cpython

Related

Recent Posts