Does it make sense to add indexes to a clustered columnstore index table?
My understanding of a clustered columnstore table (and please correct me if I am wrong) is that each column is stored in some physically ordered manner...
All columns are sorted in the same order. I say this to distinguish from a possible reading of your statement that each column is sorted in the best way for that column, which isn't true.
Each column is encoded and compressed individually in a segment within each row group. The order of operations is:
- Encode values in all columns (e.g. value offset, or using a dictionary).
- Determine the single 'optimal' row ordering.
- Compress each column (e.g. RLE, bit-pack).
The common sort order chosen may be great for some segments, but not others from a compression point of view (primarily run length encoding). SQL Server uses Vertipaq technology to choose a sort order that gives a good overall compression result.
...implying that each column already has what amounts to a clustered index on it
This isn't the right mental picture because column store 'indexes' do not support seeks, only b-trees do. SQL Server can locate an individual row in a column store for lookup purposes (as in the linked Q & A) but that isn't the same as supporting key seeks in general.
Demo of a lookup on a clustered columnstore index.
The distinction is between finding row #n in row group #m versus finding rows with a given value 'x' for a particular column. A column store index can do the former, but not the latter.
Additionally, b-tree indexes can return ordered data; column store indexes cannot (due to encoding) -- except for the trivial case of a lookup where at most one row will be returned.
Does it make sense to add indexes to a clustered columnstore index table?
Additional b-tree indexes make sense when we need to locate a single value, or a small range of values. The alternative would be to scan the column store index (albeit with possible row group elimination if the stars align correctly - as well as partition elimination if the table is partitioned).
More generally, b-tree indexes support OLTP-type queries; column store is for analytics. Providing both on the same table enables a hybrid approach (HTAP - Hybrid Transactional/Analytic Processing).
From the Microsoft Research Paper Columnstore and B+ tree – Are Hybrid Physical Designs Important? (PDF):
B+ trees outshine columnstores when query predicates are selective even when all data is memory resident; and the trade-off shifts further in favor of B+ trees when data is not memory resident. Likewise, B+ trees can be a better option for providing data in sorted order when server memory is constrained. On the other hand, columnstores are often an order of magnitude faster for large scans whether or not the data is memory resident. For updates, B+ trees are significantly cheaper. Secondary columnstores incur much lower update cost compared to primary columnstore indexes, but are still much slower than B+ trees. This empirical study indicates that for certain workloads, hybrid physical designs can provide significant performance gains.