How can I take an item from a Vec in Rust?
The reason fn take<T>(vec: Vec<T>, index: usize) -> Option<T>
does not exist in the standard library is that it is not very useful in general. For example, supposing that you have a Vec<String>
of length 10, it means throwing away 9 strings and only using 1. This seems wasteful.
In general, the standard library will try to provide an API that is useful in a maximum of scenarios, and in this instance it would be more logical to have a fn take<T>(vec: &mut Vec<T>, index: usize) -> Option<T>
.
The only question is how to preserve the invariant, of course:
- it can be preserved by exchanging with the last element, which is what
Vec::swap_remove
does, - it can be preserved by shifting the successor elements in, which is what
Vec::drain
does.
Those are very flexible, and can be adapted to fill more specific scenarios, such as yours.
Adapting swap_remove
:
fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
if index < vec.len() {
Some(vec.swap_remove(index))
} else {
None
}
}
Adapting drain
:
fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
if index < vec.len() {
vec.drain(index..index+1).next()
} else {
None
}
}
Noting that the former is more efficient: it's O(1).
I'm looking for a method that consumes the
Vec
and returns one element, without the overhead of restoringVec
's invariants the wayremove
andswap_remove
do.
This reeks of premature micro-optimization to me.
First of all, note that it is necessary to destroy the elements of the vector; you can accomplish this in two ways:
swap_remove
, then iterate over each element to destroy them,- Iterate over each element to destroy them, skipping the specific
index
.
It is not clear to me that the latter would be faster than the former; if anything it looks more complicated, with more branches (I advise two loops), which may throw off the predictor and may be less amenable to vectorization.
Secondly, before complaining about the overhead of restoring the Vec
's invariant, have you properly profiled the solution?
If we look at the swap_remove
variant, there are 3 steps:
swap_remove
(O(1)),- destroy each remaining element (O(N)),
- free the backing memory.
Step 2 may be optimized out if the element has no Drop
implementation, but otherwise I would be it's a toss whether (2) or (3) is dominating the cost.
TL;DR: I am afraid that you are fighting ghost issues, profile before trying to optimize.
You can write your function like this:
fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
if vec.get(index).is_none() {
None
} else {
Some(vec.swap_remove(index))
}
}
The code you see here (get
and swap_remove
) is guaranteed O(1).
However, kind of hidden, vec
is dropped at the end of the function and this drop operation is likely not O(1), but O(n) (where n is vec.len()
). If T
implements Drop
, then drop()
is called for every element still inside the vector, meaning dropping the vector is guaranteed O(n). If T
does not implement Drop
, then the Vec
only needs to deallocate the memory. The time complexity of the dealloc
operation depends on the allocator and is not specified, so we cannot assume it is O(1).
To mention another solution using iterators:
fn take<T>(vec: Vec<T>, index: usize) -> Option<T> {
vec.into_iter().nth(index)
}
I was about to write this:
While
Iterator::nth()
usually is a linear time operation, the iterator over a vector overrides this method to make it a O(1) operation.
But then I noticed, that this is only true for the iterator which iterates over slices. The std::vec::IntoIter
iterator which would be used in the code above, doesn't override nth()
. It has been attempted here, but it doesn't seem to be that easy.
So, as of right now, the iterator solution above is a O(n) operation! Not to mention the time needed to drop the vector, as explained above.