Why is Python set intersection faster than Rust HashSet intersection?
Hashing aside, Python races past previous versions of Rust when you intersect a tiny and a huge set the wrong way around. E.g. this code on playground:
use std::collections::HashSet;
fn main() {
let tiny: HashSet<i32> = HashSet::new();
let huge: HashSet<i32> = (0..1_000).collect();
for (left, right) in &[(&tiny, &huge), (&huge, &tiny)] {
let sys_time = std::time::SystemTime::now();
assert_eq!(left.intersection(right).count(), 0);
let elapsed = sys_time.elapsed().unwrap();
println!(
"{:9}ns starting from {:4} element set",
elapsed.subsec_nanos(),
left.len(),
);
}
}
when run with the 1.32 or earlier versions of Rust rather than a current version, reveals that you really want to invoke the intersection method on the smaller of the two sets (even in the borderline case that one set is empty). I got nice performance gains by calling this function instead of the intersection method:
fn smart_intersect<'a, T, S>(
s1: &'a HashSet<T, S>,
s2: &'a HashSet<T, S>,
) -> std::collections::hash_set::Intersection<'a, T, S>
where
T: Eq + std::hash::Hash,
S: std::hash::BuildHasher,
{
if s1.len() < s2.len() {
s1.intersection(s2)
} else {
s2.intersection(s1)
}
}
The method in Python treats the two sets equally (at least in version 3.7).
PS Why is this? Say small set Sa has A items, big set Sb has B items, it takes Th time to hash one key, Tl(X) time to locate a hashed key in a set with X elements. Then:
Sa.intersection(&Sb)
costs A * (Th + Tl(B))Sb.intersection(&Sa)
costs B * (Th + Tl(A))
Assuming the hash function is good and the buckets plenty (because if we're worrying about performance of intersection, so we should have made sure that the sets are efficient to begin with) then Tl(B) should be on par with Tl(A), or at least Tl(X) should scale much less than linearly with set size. Therefore it's A versus B that determines the cost of the operation.
PS The same problem and workaround existed for is_disjoint
and also a bit for union
(it's cheaper to copy the big set and add a few elements, than it is to copy the small set and add a lot, but not hugely). A pull request was merged in, so this discrepancy has disappeared since Rust 1.35.
When I move the set-building out of the loop and only repeat the intersection, for both cases of course, Rust is faster than Python 2.7.
I've only been reading Python 3 (setobject.c), but Python's implementation has some things going for it.
It uses the fact that both Python set objects use the same hash function, so it does not recompute the hash. Rust HashSet
s have instance-unique keys for their hash functions, so during intersection they must rehash keys from one set with the other set's hash function.
On the other hand, Python must call out to a dynamic key comparison function like PyObject_RichCompareBool
for each matching hash, while the Rust code uses generics and will specialize the hash function and comparison code for i32
. The code for hashing an i32
in Rust looks relatively cheap, and much of the hashing algorithm (handling longer input than 4 bytes) is removed.
It appears it's the construction of the sets that sets Python and Rust apart. And in fact not just construction, there's some significant code running to destruct the Rust HashSet
s as well. (This can be improved, filed bug here: #31711)
The performance problem boils down to the default hashing implementation of HashMap
and HashSet
. Rust's default hash algorithm is a good general-purpose one that also prevents against certain types of DOS attacks. However, it doesn't work great for very small or very large amounts of data.
Some profiling showed that make_hash<i32, std::collections::hash::map::RandomState>
was taking up about 41% of the total runtime. As of Rust 1.7, you can choose which hashing algorithm to use. Switching to the FNV hashing algorithm speeds up the program considerably:
extern crate fnv;
use std::collections::HashSet;
use std::hash::BuildHasherDefault;
use fnv::FnvHasher;
fn main() {
let mut len_sums = 0;
for _ in 0..100000 {
let set_1: HashSet<i32, BuildHasherDefault<FnvHasher>> = (0..1000).collect();
let set_2: HashSet<i32, BuildHasherDefault<FnvHasher>> = (500..1500).collect();
let intersection_len = set_1.intersection(&set_2).count();
len_sums += intersection_len;
}
println!("{}", len_sums);
}
On my machine, this takes 2.714s compared to Python's 9.203s.
If you make the same changes to move the set building out of the loop, the Rust code takes 0.829s compared to the Python code's 3.093s.