SQL Server: Covering indexes including all columns?
I've done this on specific indexes before now, to aid oft-run heavy queries. Effectively what they have done is create multiple clustered indexes: when any of those indexes is used to find rows no extra work is needed looking up the rest of the data in the real clustered index (or the heap if there is no real clustered index).
is this a sensible strategy?
For some indexes where needed to support certain query patterns, certainly yes.
But to do this with all indexes, I would just as certainly say no.
It is going to be wasteful of space to do where not actually needed, and will slow inserts/updates significantly. It may slow down as many read queries as it helps too, because each index page holds less records so any query needing to reference a chunk of the index for filtering but not using all other columns will have to access more pages. This will make your database more memory-hungry: those pages will need to be loaded into the buffer pool, potentially ejecting other useful pages if memory is low. If compression is used on those indexes to try mitigate the effect on storage and memory requirements, then it will be pushing extra load to the CPUs instead.
as access is via an ORM that by default (but not always) retrieves all columns
This is a common pattern with poorly optimised use of an ORM (or just naive ORMs) and in these cases I've seen SQL Server's index adviser (and similar 3rd party tools) suggest indexes with many INCLUDE
d columns, so I would agree with your suggestion that this is why the indexes have been created this way.
But while it may make all such queries slightly faster and some of them significantly faster, I suspect that in many cases any benefit is so small as to not be worth the extra memory footprint required by your common working set, the space on disk, and the IO between disk and memory.
Also remember that the ORM might not be selecting out all columns of all tables tha a query touches, so that benefit might only hold for the main target of the current request, and the larger indexes may penalise the query when other objects are used for filtering but not returning data (SELECT * FROM table1 WHERE id IN (SELECT someID FROM table2 WHERE someColumn='DesiredValue')
perhaps).
Another consideration for the excess space used, particularly if the data is large, is that it will have an impact on your backup strategy: storage and transfer costs for those backups, potential restore times, and so forth.
should we be prepared for any differences between the two [on-prem & AzureSQL]
Generally I think the considerations here are going to be the same in each case, though any excess memory/IO cost imposed by the large indexes may be more directly visible in Azure where you can tweak the service tier and therefore infrastructure cost more easily rather than having a relatively fixed set of hardware resource. If using standard/premium tiers instead of vcore based pricing then you'll be affected more by the IO cost in standard as premium includes significantly more IO per DTU. If you are using multi-region backups or redundancy or other non-local features in Azure than there might be a bandwidth cost associated with the extra space taken by uneccesarily wide indexes.
The question is, is this a sensible strategy?.... (though the question has been raised that if this strategy was optimal, wouldn't it be the default by now?)
In most cases this is not a sensible strategy. The reason being, that in general OLTP databases, the rows returned to the end user are not going to be a whole lot. (Generalization)
The question you should ask yourself is,if you are seeking on the key columns, how many rows will be returned by that seek operation? And repeat that for the queries seeking on that column.
Consider the following table, returning a whole lot of columns, where SelectiveIDField= ...
select columnA,columnC, ... columnZ
FROM dbo.BigTable
Where SelectiveIDField= '225122141';
If only one row will be returned by the seek on selectiveIDField
, is the additional key lookup such a bad thing?
(guessing you have clustered indexes here, otherwise RID lookup)
It will just do one extra key lookup, one extra execution + the join operator. Even if it is 10 or even 100, would it be that huge of an impact? This also depends on how much your query is executed and how important the execution time is.
In the case of it being negligible, just create the index on SelectiveIDField
and call it a day, it should not be worth the read gains in comparison to the write losses.
So in short, creating indexes on the entire table should in my opinion not be a default approach unless you really see an issue with a query and can improve it drastically by adding an entire covering index.