In SQL Server, is parallelism per operator, or something else?
He says that in a parallel plan, every operator gets DOP threads.
No. This is at best misleading, but closer to being simply wrong.
In a serial plan, every operator 'gets' one thread, but that thread is the same thread for all operators. The principle is similar for parallel plans.
Each parallel operator is run by DOP
threads, but those threads are not exclusive to a particular operator, they are shared among operators within the same parallel branch.
Branch boundaries are delimited by Parallelism operators (Demand, Repartition, and Gather Streams). The diagram below shows a parallel plan with three branches:
Reproduced from the article referenced at the end of this answer
So if you have
MAXDOP 8
and your query has 4 parallel operators it'll use 32 threads at once.
No. You can't just multiply DOP
by the number of operators to get the number of threads. The number of threads reserved for parallel branches is the number of parallel branches (not operators) multiplied by DOP
.
The number of threads that can be active at the same time for a single parallel query is limited to DOP
in SQL Server 2005 and later. SQL Server achieves this by allocating threads to DOP
schedulers.
I also read that it might just be 8 for the whole query, which seems like too few.
At DOP = 8
this would be correct for a plan with a single parallel branch. There can be multiple parallel plan branches in a parallel plan. For a plan with n
parallel branches, the thread reservation for parallel workers is n * DOP
.
See Parallel Execution Plans – Branches and Threads by Paul White.
Note: The number of branches reported in execution plans is normally the number of branches that may possibly execute concurrently (due to blocking operators, threads in one branch may sometimes be safely recycled for a later branch).
All of my production boxes are on SQL Server 2008 (not even R2) and there's no money to upgrade. But I got a 3 CPU VM that management doesn't know about with SQL Server 2017 developer edition (it's free!) tucked away. I did my testing on that.
Earlier this week one of our developers wrote a lot of CROSS JOIN queries and they caused some issues, but I can use a similar query for some good to mostly answer this question. The developer did something like this:
SELECT MIN(t1.high + t2.high)
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 3);
Like I said the query caused all kinds of warning sirens and I had to wake up in the middle of the night. Here's what the query plan looks like on my 2017 "instance":
So I got five operators with the racing arrows which at first makes me think that there are 5 parallel operators in the plan. SQL Server can be tricky though. The nested loop join is actually a parallel operator, so there's a total of 6 parallel operators. With a query MAXDOP of 3, if it worked the way described in your question I'd get 3 * 6 = 18 workers. I can look in the XML of the actual plan to find the following:
<QueryPlan DegreeOfParallelism="3" MemoryGrant="56" CachedPlanSize="32" CompileTime="3" CompileCPU="3" CompileMemory="272">
<ThreadStat Branches="1" UsedThreads="3">
<ThreadReservation NodeId="0" ReservedThreads="3" />
</ThreadStat>
That makes it seem like only 3 threads were used during query execution. Also, I got this query off the internet and ran it while the CROSS JOIN query was executing:
SELECT *
FROM sys.dm_os_tasks d
INNER JOIN sys.dm_exec_sessions s ON d.session_id = s.session_id
WHERE s.is_user_process = 1 AND s.session_id <> @@SPID;
It showed only four rows with exec_context_id values between 0 and 3. So this query, even with many parallel operators, only used three parallel worker threads. It might be true that more complex queries can use more workers than MAXDOP, but I think that it's safe to say that queries don't get MAXDOP workers for each parallel operator.
UPDATE:
I found a trace flag off the internet (don't use in production!) and used it to get another parallel plan:
That one had six parallel worker threads! So it does seem like queries with different patterns can get different worker threads, but it's still not MAXDOP threads for each operator.