order by clause not working in Cassandra query
Simply put, Cassandra only enforces sort order within a partition key.
PRIMARY KEY (layer_name, layer_position)
) WITH CLUSTERING ORDER BY (layer_position DESC)
In this case, layer_name
is your partition key. If you specify layer_name
in your WHERE clause, your results for that value of layer_name
will be ordered by layer_position
.
SELECT * FROM layer WHERE layer_name = 'layer1';
You don't need to specify ORDER BY. All ORDER BY really can do at the query level is apply a different sort direction (ascending vs. descending).
Cassandra works this way, because it is designed to read data in whatever order it is sorted on disk. Your partition keys are sorted by hashed token value, which is why results from an unbound WHERE clause appear to be ordered randomly.
EDIT
I have to fetch data using
state_id
column and it should be order bylayer_position
.
Cassandra tables are optimized for a specific query. While this results in high performance, the drawback is that query flexibility is limited. The way to solve for this, is to duplicate your data into an additional table designed to serve that particular query.
CREATE TABLE layer_by_state_id (
layer_name text,
layer_position text,
state_id text,
PRIMARY KEY (state_id, layer_position, layer_name)
) WITH CLUSTERING ORDER BY (layer_position DESC, layer_name ASC);
This table will allow queries like this to work:
SELECT * FROM layer WHERE state_id='thx1138';
And the results will be sorted by layer_position
, within the requested state_id
.
Now I am making a couple of assumptions that you will want to investigate:
- I am assuming that
state_id
is a good partitioning key. Meaning that it has high-enough cardinality to offer good distribution in the cluster, but low-enough cardinality that it returns enough CQL rows to make sorting worthwhile. - I am assuming that the combination of
state_id
andlayer_position
is not enough to uniquely identify each row. Therefore I am ensuring uniqueness by addinglayer_name
as an additional clustering key. You may or may not need this, but I'm guessing that you will. - I am assuming that using
state_id
as a partitioning key will not exhibit unbound growth so as to approach Cassandra's limit of 2 billion cells per partition. If that is the case, you may need to add an additional partition "bucket."