What is the point of column families?

I've just uncovered some interesting information from the RocksDB FAQ. (RocksDB is a K-V store.)

Here are some relevant extracts.

Q: What are column families used for?

A: The most common reasons of using column families: (1) use different compaction setting, comparators, compression types, merge operators, or compaction filters in different parts of data; (2) drop a column family to delete its data; (3) one column family to store metadata and another one to store the data.

Q: What's the difference between storing data in multiple column family and in multiple rocksdb database?

A: The main differences will be backup, atomic writes and performance of writes. The advantage of using multiple databases: database is the unit of backup or checkpoint. It's easier to copy a database to another host than a column family. Advantages of using multiple column families: (1) write batches are atomic across multiple column families on one database. You can't achieve this using multiple RocksDB databases. (2) If you issue sync writes to WAL, too many databases may hurt the performance.

Q: I have different key spaces. Should I separate them by prefixes, or use different column families?

A: If each key space is reasonably large, it's a good idea to put them in different column families. If it can be small, then you should consider to pack multiple key spaces into one column family, to avoid the trouble of maintaining too many column families.


I know you're not looking for a parallel with SQL but this article explains plan simply the purpose and practical benefit of Column Families.

From Understanding the Cassandra Data Model from a SQL Perspective on RubyScale:

What is a Column Family for then? Just a table prefix? A Column Family has a number of settings that go with it that alter it’s behavior. There are cache settings for the keys (the UUIDs in this example), cache settings for the entire rows (the entire table in this example), and most importantly, sorting. In Cassandra there is no OFFSET, only LIMIT and the equivalent of BETWEEN. In this example, the column names are just strings but they could also be integers or timestamps and they are always stored in sort order. One Column Family might have timestamp-sorted data where you query things by time slice and another might be address book data where you query things in alphabetical order. The only sorting you get to do after the fact is reversing a particular slice.

Tags:

Nosql