Is it possible to share a HashMap between threads without locking the entire HashMap?

I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?

In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex.

Here's an example:

use std::{
    collections::HashMap,
    sync::{Arc, Mutex, RwLock},
    thread,
    time::Duration,
};

fn main() {
    let data = Arc::new(RwLock::new(HashMap::new()));

    let threads: Vec<_> = (0..10)
        .map(|i| {
            let data = Arc::clone(&data);
            thread::spawn(move || worker_thread(i, data))
        })
        .collect();

    for t in threads {
        t.join().expect("Thread panicked");
    }

    println!("{:?}", data);
}

fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) {
    loop {
        // Assume that the element already exists
        let map = data.read().expect("RwLock poisoned");

        if let Some(element) = map.get(&id) {
            let mut element = element.lock().expect("Mutex poisoned");

            // Perform our normal work updating a specific element.
            // The entire HashMap only has a read lock, which
            // means that other threads can access it.
            *element += 1;
            thread::sleep(Duration::from_secs(1));

            return;
        }

        // If we got this far, the element doesn't exist

        // Get rid of our read lock and switch to a write lock
        // You want to minimize the time we hold the writer lock
        drop(map);
        let mut map = data.write().expect("RwLock poisoned");

        // We use HashMap::entry to handle the case where another thread 
        // inserted the same key while where were unlocked.
        thread::sleep(Duration::from_millis(50));
        map.entry(id).or_insert_with(|| Mutex::new(0));
        // Let the loop start us over to try again
    }
}

This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.

This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.

In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>. Once you have a read or write lock, you can then clone the Arc of the value, returning it and unlocking the hashmap.

The next step up would be to use a crate like arc-swap, which says:

Then one would lock, clone the [RwLock<Arc<T>>] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.

The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.

I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap. You then shard work among them. For the simple example above, you could use id % N_THREADS, for example. There are also complicated sharding schemes that depend on your data.

As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.

Maybe you want to consider rust-evmap:

A lock-free, eventually consistent, concurrent multi-value map.

The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.

Is it possible to share a HashMap between threads without locking the entire HashMap?

Tags:

Multithreading

Rust

Related

Recent Posts