Why is Cassandra not linearizable when quorum based read and writes are used
EDIT: In hindsight this isn't the best explanation. I recommend reading Anurag's answer below which is much more concise.
Since normal Cassandra operations don't observe existing state that it is changing, quorum consistency alone is not considered 'Linearizable'.
For example, if you were to to adjust the balance of a bank account, you would need to know the current balance in order to adjust it. Consider a client that executes the following operations:
A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12)
B. UPDATE account SET balance=4.12 WHERE id='x' (subtract 1$ from balance)
The problem is, from the perspective of Cassandra, the operation B
doesn't effectively 'see' A
since it's not considering the existing state of the data or any other operations that may be occurring for that matter. Another client could be updating balance for the same account during the submission of B
.
Lightweight Transactions in Cassandra 2.0 describes how lightweight transactions provide 'linearizable consistency' by providing constructs that ensure that operations are performed in sequence for a given partition and are not interrupted by others. So instead of my previous example, you can now do:
A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12)
B. UPDATE account SET balance=4.12 WHERE id='x' IF balance=5.12
The use of IF balance=5.12
instructs Cassandra to begin a lightweight transaction, which uses a paxos consesus protocol for leadership election and to ensure operations are applied sequentially. If the state of balance does not meet the condition, the update will not be applied (indicated in a successful response with a was_applied
boolean column). If C* is not able to achieve this within some timeout (due to contention or some other factors), the operation will fail, will not be applied, and the client will be surfaced a timeout.
Edit considering Cassandra foreground Read Repair:
Writes that fail because only a partial set of replicas are updated could lead to two different readers seeing two different values of data. This is because of the lack of rollbacks in simple quorum-based consistency approaches. This behavior breaks the linearizability guarantees for single-key reads. As described in this discussion, a distributed consensus protocol such as Raft or Paxos is a must-have for such a guarantee.
Also, other phenomena such as clock drift and leap second can break the Cassandra session consistency.
Earlier Answer (without considering Cassandra foreground read repair):
Summary: In Cassandra write may not feel atomic. Some nodes get writes faster than others thus even if we rely on quorum the result depends on the set of nodes that return values and what values they hold at that point.
Also, to explain definition of linearizability adding to definition in bold
If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state (but never old state again) .
Copying from Martin Klepmann's Data Intensive Applications book
Linearizability and quorums Intuitively, it seems as though strict quorum reads and writes should be linearizable in a Dynamo-style model. However, when we have variable network delays, it is possible to have race conditions, as demonstrated in Figure 9-6.
In Figure 9-6, the initial value of x is 0, and a writer client is updating x to 1 by sending the write to all three replicas (n = 3, w = 3). Concurrently, client A reads from a quorum of two nodes (r = 2) and sees the new value 1 on one of the nodes. Also concurrently with the write, client B reads from a different quorum of two nodes, and gets back the old value 0 from both.
The quorum condition is met (w + r > n), but this execution is nevertheless not linearizable: B’s request begins after A’s request completes, but B returns the old value while A returns the new value. (It’s once again the Alice and Bob situation from Figure 9-1.)
Interestingly, it is possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair (see “Read repair and antientropy” on page 178) synchronously, before returning results to the application [23], and a writer must read the latest state of a quorum of nodes before sending its writes [24, 25]. However, Riak does not perform synchronous read repair due to the performance penalty [26]. Cassandra does wait for read repair to complete on quorum reads [27], but it loses linearizability if there are multiple concurrent writes to the same key, due to its use of last-write-wins conflict resolution.
Moreover, only linearizable read and write operations can be implemented in this way; a linearizable compare-and-set operation cannot, because it requires a consensus algorithm [28].
In summary, it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability.
And a bit more explaination about Linearizability vs Serializability: