Python's immutable strings and their slices

That's how slices work. Slices always perform a shallow copy, allowing you to do things like

>>> x = [1,2,3]
>>> y = x[:]

Now it would be possible to make an exception for strings, but is it really worth it? Eric Lippert blogged about his decision not to do that for .NET; I guess his argument is valid for Python as well.

See also this question.


The underlying string representation is null-terminated, even though it keeps track of the length, hence you cannot have a string object that references a sub-string that isn't a suffix. This already limits the usefulness of your proposal since it would add a lot of complications to deal differently with suffices and non-suffices (and giving up with null-terminating strings brings other consequences).

Allowing to refer to sub-strings of a string means to complicate a lot garbage collection and string handling. For every string you'd have to keep track how many objects refer to each character, or to each range of indices. This means complicating a lot the struct of string objects and any operation that deals with them, meaning a, probably big, slow down.

Add the fact that starting with python3 strings have 3 different internal representations, and things are going to be too messy to be maintainable, and your proposal probably doesn't give enough benefits to be accepted.


An other problem with this kind of "optimization" is when you want to deallocate "big strings":

a = "Some string" * 10 ** 7
b = a[10000]
del a

After this operations you have the substring b that prevents a, a huge string, to be deallocated. Surely you could do copies of small strings, but what if b = a[:10000](or another big number)? 10000 characters looks like a big string which ought to use the optimization to avoid copying, but it is preventing to realease megabytes of data. The garbage collector would have to keep checking whether it is worth to deallocate a big string object and make copies or not, and all these operations must be as fast as possible, otherwise you end up decreasing time-performances.

99% of the times the strings used in the programs are "small"(max 10k characters), hence copying is really fast, while the optimizations you propose start to become effective with really big strings(e.g. take substrings of size 100k from huge texts) and are much slower with really small strings, which is the common case, i.e. the case that should be optimized.


If you think important then you are free to propose a PEP, show an implementation and the resultant changes in speed/memory usage of your proposal. If it is really worth the effort it may be included in a future version of python.


If you are worried about memory (in the case of really large strings), use a buffer():

>>> a = "12345"
>>> b = buffer(a, 2, 2)
>>> b
<read-only buffer for 0xb734d120, size 2, offset 2 at 0xb734d4a0>
>>> print b
34
>>> print b[:]
34

Knowing about this allows you for alternatives to string methods such as split().

If you want to split() a string, but keep the original string object (as you maybe need it), you could do:

def split_buf(s, needle):
    start = None
    add = len(needle)
    res = []
    while True:
        index = s.find(needle, start)
        if index < 0:
            break
        res.append(buffer(s, start, index-start))
        start = index + add
    return res

or, using .index():

def split_buf(s, needle):
    start = None
    add = len(needle)
    res = []
    try:
        while True:
            index = s.index(needle, start)
            res.append(buffer(s, start, index-start))
            start = index + add
    except ValueError:
        pass
    return res