Why does SQL Server return some rows while still executing the query, and sometimes not?
The answer, as usual (alright, most of the time), lies in the execution plan.
There are certain operators that require all rows to arrive at them before they can start processing those rows, and passing them downstream, for example:
- Hash Join (on building the hash table)
- Hash Match
- Sort (Except Hash Flow Distinct)
They're either called blocking, or stop and go operators because of this, and they're often chosen when the optimizer thinks it'll have to process a whole lot of data to find your data.
There are other operators that are able to begin streaming, or passing any found rows along immediately
- Nested Loops
- Index supported Merge Joins
- Stream Aggregates
When queries start returning data immediately, but don't finish immediately, it's usually a sign that the the optimizer chose a plan to locate and return some rows quickly using operators that have a lower start up cost.
This can happen because of row goals introduced either by you, or by the optimizer.
It can also happen if a bad plan is chosen for some reason (lack of SARGability, parameter sniffing, insufficient statistics, etc.), but that takes more digging to figure out.
For more information, check out Rob Farley's blog here
And Paul White's series on row goals here, here, here, and here.
It should also be noted that, if you're talking about SSMS, rows only appear once an entire buffer has been filled, not just willy-nilly.
If I understand what you're observing, this is how Management Studio renders rows, and has little to do with how SQL Server returns rows. In fact often when you are returning large results to SSMS and attempting to render them in a grid, SSMS can't keep up and SQL Server ends up waiting for the app to process more rows. In this case you'll see SQL Server accumulating ASYNC_NETWORK_IO
waits.
You can control it somewhat by using Results to Text instead of Results to Grid, since SSMS can draw text faster than it can draw grids, but you'll likely find this can affect readability depending on the number of columns and the data types involved. Both are impacted by when SSMS decides to actually write results out to that pane, which depends on how full the output buffer is.
When you have multiple statements, and you want to force the buffer to render output results to the messages pane, you can use a little printing trick in between statements:
RAISERROR('', 0, 1) WITH NOWAIT;
But this won't help when you're trying to get SSMS to render rows more quickly when all the output is coming from a single statement.
More directly, you can control it by limiting how many results you are rendering in SSMS. I often see people complain about how long it takes to return a million rows to the grid. What on earth anyone is going to do with a million rows in an SSMS grid, I have no idea.
There are some hacks like OPTION (FAST 100)
, which will optimize for retrieving those first 100 rows (or any 100 rows if there is no outer ORDER BY
), but this can come at the cost of much slower retrieval for the remainder of the rows and a plan that is more inefficient overall, so isn't really a go-to option IMHO.