TPL Dataflow, guarantee completion only when ALL source data blocks completed
The issue is exactly what casperOne said in his answer. Once the first transform block completes, the processor block goes into “finishing mode”: it will process remaining items in its input queue, but it won't accept any new items.
There is a simpler fix than splitting your processor block in two though: don't set PropagateCompletion
, but instead set completion of the processor block manually when both transform blocks complete:
Task.WhenAll(transformBlock1.Completion, transformBlock2.Completion)
.ContinueWith(_ => processorBlock.Complete());
The issue here is that you are setting the PropagateCompletion
property each time you call the LinkTo
method to link the blocks and the different in wait times in your transformation blocks.
From the documentation for the Complete
method on the IDataflowBlock
interface (emphasis mine):
Signals to the IDataflowBlock that it should not accept nor produce any more messages nor consume any more postponed messages.
Because you stagger out your wait times in each of the TransformBlock<TInput, TOutput>
instances, transformBlock2
(waiting for 20 ms) is finished before transformBlock1
(waiting for 50 ms). transformBlock2
completes first, and then sends the signal to processorBlock
which then says "I'm not accepting anything else" (and transformBlock1
hasn't produced all of its messages yet).
Note that the processing of transformBlock1
before transformBlock1
is not absolutely guaranteed; it's feasible that the thread pool (assuming you're using the default scheduler) will process the tasks in a different order (but more than likely will not, as it will steal work from the queues once the 20 ms items are done).
Your pipeline looks like this:
broadcastBlock
/ \
transformBlock1 transformBlock2
\ /
processorBlock
In order to get around this, you want to have a pipeline that looks like this:
broadcastBlock
/ \
transformBlock1 transformBlock2
| |
processorBlock1 processorBlock2
Which is accomplished by just creating two separate ActionBlock<TInput>
instances, like so:
// The action, can be a method, makes it easier to share.
Action<string> a = i => Console.WriteLine(i);
// Create the processor blocks.
processorBlock1 = new ActionBlock<string>(a);
processorBlock2 = new ActionBlock<string>(a);
// Linking
broadCastBlock.LinkTo(transformBlock1,
new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(transformBlock2,
new DataflowLinkOptions { PropagateCompletion = true });
transformBlock1.LinkTo(processorBlock1,
new DataflowLinkOptions { PropagateCompletion = true });
transformBlock2.LinkTo(processorBlock2,
new DataflowLinkOptions { PropagateCompletion = true });
You then need to wait on both processor blocks instead of just one:
Task.WhenAll(processorBlock1.Completion, processorBlock2.Completion).Wait();
A very important note here; when creating an ActionBlock<TInput>
, the default is to have the MaxDegreeOfParallelism
property on the ExecutionDataflowBlockOptions
instance passed to it set to one.
This means that the calls to the Action<T>
delegate that you pass to the ActionBlock<TInput>
are thread-safe, only one will execute at a time.
Because you now have two ActionBlock<TInput>
instances pointing to the same Action<T>
delegate, you aren't guaranteed thread-safety.
If your method is thread-safe, then you don't have to do anything (which would allow you to set the MaxDegreeOfParallelism
property to DataflowBlockOptions.Unbounded
, since there's no reason to block).
If it's not thread-safe, and you need to guarantee it, you need to resort to traditional synchronization primitives, like the lock
statement.
In this case, you'd do it like so (although it's clearly not needed, as the WriteLine
method on the Console
class is thread-safe):
// The lock.
var l = new object();
// The action, can be a method, makes it easier to share.
Action<string> a = i => {
// Ensure one call at a time.
lock (l) Console.WriteLine(i);
};
// And so on...
An addition to svick's answer: to be consistent with the behaviour you get with the PropagateCompletion option, you also need to forward exceptions in case a preceding block faulted. An extension method like the following takes care of that as well:
public static void CompleteWhenAll(this IDataflowBlock target, params IDataflowBlock[] sources) {
if (target == null) return;
if (sources.Length == 0) { target.Complete(); return; }
Task.Factory.ContinueWhenAll(
sources.Select(b => b.Completion).ToArray(),
tasks => {
var exceptions = (from t in tasks where t.IsFaulted select t.Exception).ToList();
if (exceptions.Count != 0) {
target.Fault(new AggregateException(exceptions));
} else {
target.Complete();
}
}
);
}