Python list.clear() time and space complexity?

It's O(1) neglecting memory management. It's not quite right to say it's O(N) accounting for memory management, because accounting for memory management is complicated.

Most of the time, for most purposes, we treat the costs of memory management separately from the costs of the operations that triggered it. Otherwise, just about everything you could possibly do becomes O(who even knows), because almost any operation could trigger a garbage collection pass or an expensive destructor or something. Heck, even in languages like C with "manual" memory management, there's no guarantee that any particular malloc or free call will be fast.

There's an argument to be made that refcounting operations should be treated differently. After all, list.clear explicitly performs a number of Py_XDECREF operations equal to the list's length, and even if no objects are deallocated or finalized as a result, the refcounting itself will necessarily take time proportional to the length of the list.

If you count the Py_XDECREF operations list.clear performs explicitly, but ignore any destructors or other code that might be triggered by the refcounting operations, and you assume PyMem_FREE is constant time, then list.clear is O(N), where N is the original length of the list. If you discount all memory management overhead, including the explicit Py_XDECREF operations, list.clear is O(1). If you count all memory management costs, then the runtime of list.clear cannot be asymptotically bounded by any function of the list's length.

As the other answers have noted, it takes O(n) time to clear a list of length n. But I think there is an additional point to be made about amortized complexity here.

If you start with an empty list, and do N append or clear operations in any order, then the total running time across all of those operations is always O(N), giving an average per operation of O(1), however long the list gets in the process, and however many of those operations are clear.

Like clear, the worst case for append is also O(n) time where n is the length of the list. That's because when the capacity of the underlying array needs to be increased, we have to allocate a new array and copy everything across. But the cost of copying each element can be "charged" to one of the append operations which got the list to a length where the array needs to be resized, in such a way that N append operations starting from an empty list always take O(N) time.

Likewise, the cost of decrementing an element's refcount in the clear method can be "charged" to the append operation which inserted that element in the first place, because each element can only get cleared once. The conclusion is that if you are using a list as an internal data structure in your algorithm, and your algorithm repeatedly clears that list inside a loop, then for the purpose of analysing your algorithm's time complexity you should count clear on that list as an O(1) operation, just as you'd count append as an O(1) operation in the same circumstances.

As you correctly noticed, the CPython implementation of list.clear is O(n). The code iterates over the elements in order to decrease the reference count of each one, without a way to avoid it. There is no doubt that it is an O(n) operation and, given a large enough list, you can measure the time spent in clear() as function of list size:

import time

for size in 1_000_000, 10_000_000, 100_000_000, 1_000_000_000:
    l = [None] * size
    t0 = time.time()
    l.clear()
    t1 = time.time()
    print(size, t1 - t0)

The output shows linear complexity; on my system with Python 3.7 it prints the following:

1000000 0.0023756027221679688
10000000 0.02452826499938965
100000000 0.23625731468200684
1000000000 2.31496524810791

The time per element is of course tiny because the loop is coded in C and each iteration does very little work. But, as the above measurement shows, even a miniscule per-element factor eventually adds up. Small per-element constant is not the reason to ignore the cost of an operation, or the same would apply to the loop that shifts the list elements in l.insert(0, ...), which is also very efficient - and yet few would claim insertion at the beginning to be O(1). (And clear potentially does more work because a decref will run an arbitrary chain of destructors for an object whose reference count actually reaches zero.)

On a philosophical level, one could argue that costs of memory management should be ignored when assessing complexity because otherwise it would be impossible to analyze anything with certainty, as any operation could trigger a GC. This argument has merit; GC does come occasionally and unpredictably, and its cost can be considered amortized across all allocations. In a similar vein complexity analysis tends to ignore the complexity of malloc because the parameters it depends on (like memory fragmentation) are typically not directly related to allocation size or even to the number of already allocated blocks. However, in case of list.clear there is only one allocated block, no GC is triggered, and the code is still visiting each and every list element. Even with the assumption of O(1) malloc and amortized O(1) GC, list.clear still takes the time proportional to the number of elements in the list.

The article linked from the question is about Python the language and doesn't mention a particular implementation. Python implementations that don't use reference counting, such as Jython or PyPy, are likely to have true O(1) list.clear, and for them the claim from the article would be entirely correct. So, when explaining the Python list on a conceptual level, it is not wrong to say that clearing the list is O(1) - after all, all the object references are in a contiguous array, and you free it only once. This is the point your blog post probably should make, and that is what the linked article is trying to say. Taking the cost of reference counting into account too early might confuse your readers and give them completely wrong ideas about Python's lists (e.g. they could imagine that they are implemented as linked lists).

Finally, at some point one must accept that memory management strategy does change complexity of some operations. For example, destroying a linked list in C++ is O(n) from the perspective of the caller; discarding it in Java or Go would be O(1). And not in the trivial sense of a garbage-collected language just postponing the same work for later - it is quite possible that a moving collector will only traverse reachable objects and will indeed never visit the elements of the discarded linked list. Reference counting makes discarding large containers algorithmically similar to manual collection, and GC can remove that. While CPython's list.clear has to touch every element to avoid a memory leak, it is quite possible that PyPy's garbage collector never needs to do anything of the sort, and thus has a true O(1) list.clear.

Python list.clear() time and space complexity?

Tags:

Python

Time Complexity

Space Complexity

Cpython

Related

Recent Posts