Is it safe, to share an array between threads?
It depends how you define "array" and "share". So far as array goes, there are two cases that need to be considered separately:
- Fixed size arrays (declared
my @a[$size]
); this includes multi-dimensional arrays with fixed dimensions (such asmy @a[$xs, $ys]
). These have the interesting property that the memory backing them never has to be resized. - Dynamic arrays (declared
my @a
), which grow on demand. These are, under the hood, actually using a number of chunks of memory over time as they grow.
So far as sharing goes, there are also three cases:
- The case where multiple threads touch the array over its lifetime, but only one can ever be touching it at a time, due to some concurrency control mechanism or the overall program structure. In this case the arrays are never shared in the sense of "concurrent operations using the arrays", so there's no possibility to have a data race.
- The read-only, non-lazy case. This is where multiple concurrent operations access a non-lazy array, but only to read it.
- The read/write case (including when reads actually cause a write because the array has been assigned something that demands lazy evaluation; note this can never happen for fixed size arrays, as they are never lazy).
Then we can summarize the safety as follows:
| Fixed size | Variable size |
---------------------+----------------+---------------+
Read-only, non-lazy | Safe | Safe |
Read/write or lazy | Safe * | Not safe |
The * indicating the caveat that while it's safe from Perl 6's point of view, you of course have to make sure you're not doing conflicting things with the same indices.
So in summary, fixed size arrays you can safely share and assign to elements of from different threads "no problem" (but beware false sharing, which might make you pay a heavy performance penalty for doing so). For dynamic arrays, it is only safe if they will only be read from during the period they are being shared, and even then if they're not lazy (though given array assignment is mostly eager, you're not likely to hit that situation by accident). Writing, even to different elements, risks data loss, crashes, or other bad behavior due to the growing operation.
So, considering the original example, we see my @copy;
and my @length;
are dynamic arrays, so we must not write to them in concurrent operations. However, that happens, so the code can be determined not safe.
The other posts already here do a decent job of pointing in better directions, but none nailed the gory details.
Just have the code that is marked with the start
statement prefix return the values so that Perl 6 can handle the synchronization for you. Which is the whole point of that feature.
Then you can wait for all of the Promises, and get all of the results using an await
statement.
my @promise = do for @vb -> $r {
start
do # to have the 「for」 block return its values
for $r[0]..^$r[1] -> $i {
$i, my_sub( @orig[$i], $len )
}
}
my @results = await @promise;
for @results -> ($i,$copy,$len) {
@copy[$i] = $copy;
@length[$i] = $len;
}
The start
statement prefix is only sort-of tangentially related to parallelism.
When you use it you are saying, “I don't need these results right now, but probably will later”.
That is the reason it returns a Promise (asynchrony), and not a Thread (concurrency)
The runtime is allowed to delay actually running that code until you finally ask for the results, and even then it could just do all of them sequentially in the same thread.
If the implementation actually did that, it could result in something like a deadlock if you instead poll the Promise by continually calling it's .status
method waiting for it to change from Planned
to Kept
or Broken
, and only then ask for its result.
This is part of the reason the default scheduler will start to work on any Promise codes if it has any spare threads.
I recommend watching jnthn's talk “Parallelism, Concurrency,
and Asynchrony in Perl 6”.
slides
This answer applies to my understanding of the situation on MoarVM, not sure what the state of art is on the JVM backend (or the Javascript backend fwiw).
- Reading a scalar from several threads can be done safely.
- Modifying a scalar from several threads can be done without having to fear for a segfault, but you may miss updates:
$ perl6 -e 'my $i = 0; await do for ^10 { start { $i++ for ^10000 } }; say $i'
46785
The same applies to more complex data structures like arrays (e.g. missing values being pushed) and hashes (missing keys being added).
So, if you don't mind missing updates, changing shared data structures from several threads should work. If you do mind missing updates, which I think is what you generally want, you should look at setting up your algorithm in a different way, as suggested by @Zoffix Znet and @raiph.