Python list.clear() time and space complexity?
It's O(1) neglecting memory management. It's not quite right to say it's O(N) accounting for memory management, because accounting for memory management is complicated.
Most of the time, for most purposes, we treat the costs of memory management separately from the costs of the operations that triggered it. Otherwise, just about everything you could possibly do becomes O(who even knows), because almost any operation could trigger a garbage collection pass or an expensive destructor or something. Heck, even in languages like C with "manual" memory management, there's no guarantee that any particular malloc
or free
call will be fast.
There's an argument to be made that refcounting operations should be treated differently. After all, list.clear
explicitly performs a number of Py_XDECREF
operations equal to the list's length, and even if no objects are deallocated or finalized as a result, the refcounting itself will necessarily take time proportional to the length of the list.
If you count the Py_XDECREF
operations list.clear
performs explicitly, but ignore any destructors or other code that might be triggered by the refcounting operations, and you assume PyMem_FREE
is constant time, then list.clear
is O(N), where N is the original length of the list. If you discount all memory management overhead, including the explicit Py_XDECREF
operations, list.clear
is O(1). If you count all memory management costs, then the runtime of list.clear
cannot be asymptotically bounded by any function of the list's length.
As the other answers have noted, it takes O(n) time to clear a list of length n. But I think there is an additional point to be made about amortized complexity here.
If you start with an empty list, and do N append
or clear
operations in any order, then the total running time across all of those operations is always O(N), giving an average per operation of O(1), however long the list gets in the process, and however many of those operations are clear
.
Like clear
, the worst case for append
is also O(n) time where n is the length of the list. That's because when the capacity of the underlying array needs to be increased, we have to allocate a new array and copy everything across. But the cost of copying each element can be "charged" to one of the append
operations which got the list to a length where the array needs to be resized, in such a way that N append
operations starting from an empty list always take O(N) time.
Likewise, the cost of decrementing an element's refcount in the clear
method can be "charged" to the append
operation which inserted that element in the first place, because each element can only get cleared once. The conclusion is that if you are using a list as an internal data structure in your algorithm, and your algorithm repeatedly clears that list inside a loop, then for the purpose of analysing your algorithm's time complexity you should count clear
on that list as an O(1) operation, just as you'd count append
as an O(1) operation in the same circumstances.
As you correctly noticed, the CPython implementation of list.clear
is O(n). The code iterates over the elements in order to decrease the reference count of each one, without a way to avoid it. There is no doubt that it is an O(n) operation and, given a large enough list, you can measure the time spent in clear()
as function of list size:
import time
for size in 1_000_000, 10_000_000, 100_000_000, 1_000_000_000:
l = [None] * size
t0 = time.time()
l.clear()
t1 = time.time()
print(size, t1 - t0)
The output shows linear complexity; on my system with Python 3.7 it prints the following:
1000000 0.0023756027221679688
10000000 0.02452826499938965
100000000 0.23625731468200684
1000000000 2.31496524810791
The time per element is of course tiny because the loop is coded in C and each iteration does very little work. But, as the above measurement shows, even a miniscule per-element factor eventually adds up. Small per-element constant is not the reason to ignore the cost of an operation, or the same would apply to the loop that shifts the list elements in l.insert(0, ...)
, which is also very efficient - and yet few would claim insertion at the beginning to be O(1). (And clear
potentially does more work because a decref will run an arbitrary chain of destructors for an object whose reference count actually reaches zero.)
On a philosophical level, one could argue that costs of memory management should be ignored when assessing complexity because otherwise it would be impossible to analyze anything with certainty, as any operation could trigger a GC. This argument has merit; GC does come occasionally and unpredictably, and its cost can be considered amortized across all allocations. In a similar vein complexity analysis tends to ignore the complexity of malloc
because the parameters it depends on (like memory fragmentation) are typically not directly related to allocation size or even to the number of already allocated blocks. However, in case of list.clear
there is only one allocated block, no GC is triggered, and the code is still visiting each and every list element. Even with the assumption of O(1) malloc and amortized O(1) GC, list.clear
still takes the time proportional to the number of elements in the list.
The article linked from the question is about Python the language and doesn't mention a particular implementation. Python implementations that don't use reference counting, such as Jython or PyPy, are likely to have true O(1) list.clear
, and for them the claim from the article would be entirely correct. So, when explaining the Python list on a conceptual level, it is not wrong to say that clearing the list is O(1) - after all, all the object references are in a contiguous array, and you free it only once. This is the point your blog post probably should make, and that is what the linked article is trying to say. Taking the cost of reference counting into account too early might confuse your readers and give them completely wrong ideas about Python's lists (e.g. they could imagine that they are implemented as linked lists).
Finally, at some point one must accept that memory management strategy does change complexity of some operations. For example, destroying a linked list in C++ is O(n) from the perspective of the caller; discarding it in Java or Go would be O(1). And not in the trivial sense of a garbage-collected language just postponing the same work for later - it is quite possible that a moving collector will only traverse reachable objects and will indeed never visit the elements of the discarded linked list. Reference counting makes discarding large containers algorithmically similar to manual collection, and GC can remove that. While CPython's list.clear
has to touch every element to avoid a memory leak, it is quite possible that PyPy's garbage collector never needs to do anything of the sort, and thus has a true O(1) list.clear
.