What is columnstore index and how is different from clustered and non-clustered

Clustered Columnstore Index is a new feature in SQL Server 2014. Columnstore index allows data to be stored in a columnar format instead of traditional row-based storage. Column store indexes (Non-clustered) were originally introduced in SQL 2012 to address high query performance under high volume requirements typical of Data warehousing/Reporting.

Major points:

It stores data in columnar data structure which aids reads very faster. Stores data in compressed format and hence your total IO cost will be very minimal.

Columnstore data structure is the same data structure where data and indexes everything stored, unlike data stored separately and indexes stored separately etc.,

It will be very useful for more columns table where you select only limited columns daily, for example if there is ProductSalesFact table, you normally select for this product what is the count of sales, or for this quarter what is the sales etc, Eventhough it has hundreds of columns it access only two required columns.

My blog on columnstore index which provide performance study of 300 million records with columnstore vs rowstore

https://sqlserver101.wordpress.com/2016/01/25/why-clustered-columnstore-index-in-sql-server-2014/

MSDN link for various versions of columnstore and paths:

https://msdn.microsoft.com/en-us/library/dn934994.aspx

Columnstore index is very well explained here: http://www.patrickkeisler.com/2014/04/what-is-non-clustered-columnstore-index.html

The traditional clustered and non-clustered index you mentioned are both rowstore index, where the database stores the index row by row. The index will spread over several partitions, so even when we select only one column, the database still have to scan over all partitions to get the data, hence making a lot of I/O's.

Columnstore index, on the other hand, stores the index column by column. Normally, this will have all data of a column stored within one partition since all the data of one column combined is not that large. Now, when we select 1 column from the index, the database can return the data from one partition, which reduces a lot of I/O's. Moreover, columnstore index often has a significant compression ratio, therefore the I/O is even more efficient and the whole index can be saved in memory, which helps make queries 10x to 100x faster.

Columnstore index does not always perform better than rowstore. Columnstore index is suitable for scenarios like data warehousing and BI, where the data are often processed in mass, such as for aggregations. However, it performs worse than rowstore index in scenarios where data are often searched individual rows.

One thing worth noticing is that non-clustered columnstore index locks your table from being changed (but there are some work-around solutions to change the data), while clustered columnstore index still allows you to edit the data without dropping or disabling the index.

For more information on this topic, please refer to the article above and also try reading the MSDN documents.

Assume you have a table like below with col1 as primary key

col1  col2  col3
1      2     3
4      5     6

Normal index will be stored like below,assuming a page can hold only one row

   row1  1   2  3--page1-- all columns reside in one page
   row2  4   5  6--page2

so when you want to read some thing like sum(col3),SQLServer will need to read page1 and page 2 ,that's a cost of two pages..

Now with column store indexes,The same table will be stored like below

page1  page2   page3
1       2       3
4       5       6

Now if you want to do a sum of col3,it just has to read one page(page3)

Benefit of using column store indexes is, you may touch only necessary pages from Disk .Memory is also efficiently used ,since you will not be storing/reading unwanted data

What is columnstore index and how is different from clustered and non-clustered

Tags:

Sql Server

Related

Recent Posts