What exactly is a wide column store?
Most (if not all) Wide-column stores are indeed row-oriented stores in that every parts of a record are stored together. You can see that as a 2-dimensional key-value store. The first part of the key is used to distribute the data across servers, the second part of the key lets you quickly find the data on the target server.
Wide-column stores will have different features and behaviors. However, Apache Cassandra, for example, allows you to define how the data will be sorted. Take this table for example:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | US | 2020-10-01 | "a..." |
| 1 | JP | 2020-11-01 | "b..." |
| 1 | US | 2020-09-01 | "c..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|
If your partitioning key is (id)
and your clustering key is (country, timestamp)
, the data will be stored like this:
[Key 1]
1:JP,2020-11-01,"b..." | 1:US,2020-09-01,"c..." | 1:US,2020-10-01,"a..."
[Key2]
2:CA,2019-10-01,"e..." | 2:CA,2020-10-01,"d..." | 2:CA,2020-11-01,"f..."
[Key3]
3:GB,2020-09-01,"g..." | 3:GB,2020-09-02,"h..."
Or in table form:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | JP | 2020-11-01 | "b..." |
| 1 | US | 2020-09-01 | "c..." |
| 1 | US | 2020-10-01 | "a..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|
If you change the primary key (composite of partitioning and clustering key) to (id, timestamp) WITH CLUSTERING ORDER BY (timestamp DESC)
(id is the partitioning key, timestamp is the clustering key in descending order), the result would be:
[Key 1]
1:US,2020-09-01,"c..." | 1:US,2020-10-01,"a..." | 1:JP,2020-11-01,"b..."
[Key2]
2:CA,2019-10-01,"e..." | 2:CA,2020-10-01,"d..." | 2:CA,2020-11-01,"f..."
[Key3]
3:GB,2020-09-01,"g..." | 3:GB,2020-09-02,"h..."
Or in table form:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | US | 2020-09-01 | "c..." |
| 1 | US | 2020-10-01 | "a..." |
| 1 | JP | 2020-11-01 | "b..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|
Let's start with the definition of a wide column database.
Its architecture uses (a) persistent, sparse matrix, multi-dimensional mapping (row-value, column-value, and timestamp) in a tabular format meant for massive scalability (over and above the petabyte scale).
A relational database is designed to maintain the relationship between the entity and the columns that describe the entity. A good example is a Customer table. The columns hold values describing the Customer's name, address, and contact information. All of this information is the same for each and every customer.
A wide column database is one type of NoSQL database.
Maybe this is a better image of four wide column databases.
My understanding is that the first image at the top, the Column model, is what we called an entity/attribute/value table. It's an attribute/value table within a particular entity (column).
For Customer information, the first wide-area database example might look like this.
Customer ID Attribute Value
----------- --------- ---------------
100001 name John Smith
100001 address 1 10 Victory Lane
100001 address 3 Pittsburgh, PA 15120
Yes, we could have modeled this for a relational database. The power of the attribute/value table comes with the more unusual attributes.
Customer ID Attribute Value
----------- --------- ---------------
100001 fav color blue
100001 fav shirt golf shirt
Any attribute that a marketer can dream up can be captured and stored in an attribute/value table. Different customers can have different attributes.
The Super Column model keeps the same information in a different format.
Customer ID: 100001
Attribute Value
--------- --------------
fav color blue
fav shirt golf shirt
You can have as many Super Column models as you have entities. They can be in separate NoSQL tables or put together as a Super Column family.
The Column Family and Super Column family simply gives a row id to the first two models in the picture for quicker retrieval of information.