Confused why after 2nd evaluation of += operator of immutable string does not change the id in Python3

This is only possible due to a weird, slightly-sketchy optimization for string concatenation in the bytecode evaluation loop. The INPLACE_ADD implementation special-cases two string objects:

Click to copy

case TARGET(INPLACE_ADD): {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *sum;
    if (PyUnicode_CheckExact(left) && PyUnicode_CheckExact(right)) {
        sum = unicode_concatenate(tstate, left, right, f, next_instr);
        /* unicode_concatenate consumed the ref to left */
    }
    else {
        ...

and calls a unicode_concatenate helper that delegates to PyUnicode_Append, which tries to mutate the original string in-place:

Click to copy

void
PyUnicode_Append(PyObject **p_left, PyObject *right)
{
    ...
    if (unicode_modifiable(left)
        && PyUnicode_CheckExact(right)
        && PyUnicode_KIND(right) <= PyUnicode_KIND(left)
        /* Don't resize for ascii += latin1. Convert ascii to latin1 requires
           to change the structure size, but characters are stored just after
           the structure, and so it requires to move all characters which is
           not so different than duplicating the string. */
        && !(PyUnicode_IS_ASCII(left) && !PyUnicode_IS_ASCII(right)))
    {
        /* append inplace */
        if (unicode_resize(p_left, new_len) != 0)
            goto error;

        /* copy 'right' into the newly allocated area of 'left' */
        _PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
    }
    ...

The optimization only happens if unicode_concatenate can guarantee there are no other references to the LHS. Your initial a="d" had other references, since Python uses a cache of 1-character strings in the Latin-1 range, so the optimization didn't trigger. The optimization can also fail to trigger in a few other cases, such as if the LHS has a cached hash, or if realloc needs to move the string (in which case most of the optimization's code path executes, but it doesn't succeed in performing the operation in-place).

This optimization violates the normal rules for id and +=. Normally, += on immutable objects is supposed to create a new object before clearing the reference to the old object, so the new and old objects should have overlapping lifetimes, forbidding equal id values. With the optimization in place, the string after the += has the same ID as the string before the +=.

The language developers decided they cared more about people who would put string concatenation in a loop, see bad performance, and assume Python sucks, than they cared about this obscure technical point.

Confused why after 2nd evaluation of += operator of immutable string does not change the id in Python3

Tags:

Python

String

Immutability

Python 3.8

Related

Recent Posts