Data Versioning in Cassandra with CQL3

How would this work Dexter?

It is actually very similar to your solution actually except you can store all versions and be able to fetch the 'latest' version just from one table (document_versions).

In most cases I think you can get what you want in a single SELECT except use case #2 where fetching the most recent version of a document where a pre SELECT is needed on document_versions first.

SECOND ATTEMPT

(I removed the code from the first attempt, apologies to anyone who was following in the comments).

CREATE TABLE documents (
        itemid_version text,
        xml_payload text,
        insert_time timestamp,
        PRIMARY KEY (itemid_version)
);

CREATE TABLE document_versions (
        itemid text,
        version int,
        PRIMARY KEY (itemid, version)
) WITH CLUSTERING ORDER BY (version DESC);


INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc1-1', '<?xml>1st</xml>', '2014-05-21 18:00:00');
INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc1-2', '<?xml>2nd</xml>', '2014-05-21 18:00:00');
INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc2-1', '<?xml>1st</xml>', '2014-05-21 18:00:00');
INSERT INTO documents (itemid_version, xml_payload, insert_time) VALUES ('doc2-2', '<?xml>2nd</xml>', '2014-05-21 18:00:00');

INSERT INTO document_versions (itemid, version) VALUES ('doc1', 1);
INSERT INTO document_versions (itemid, version) VALUES ('doc1', 2);
INSERT INTO document_versions (itemid, version) VALUES ('doc2', 1);
INSERT INTO document_versions (itemid, version) VALUES ('doc2', 2);
  1. user needs to get the single (1) doc he wants, he knows the item id and version (not necessarily the latest)

    SELECT * FROM documents WHERE itemid_version = 'doc1-2';

  2. user needs to get the single (1) doc he wants, he knows the item id but does not know the latest version (You would feed concatenated itemid + version in result of first query into second query)

    SELECT * FROM document_versions WHERE itemid = 'doc2' LIMIT 1;

    SELECT * FROM documents WHERE itemid_version = 'doc2-2';

  3. user needs the version history of a single (1) doc.

    SELECT * FROM document_versions WHERE itemid = 'doc2';

  4. user needs to get the list (1 or more) of docs he wants, he knows the item id AND version (not necessarily the latest)

    SELECT * FROM documents WHERE itemid_version IN ('doc1-2', 'doc2-1');

Cheers,


Lets see if we can come up with a model in a top down fashion starting from your queries:

CREATE TABLE document_versions (
  itemid uuid,
  name text STATIC,
  vewrsion int,   
  xml_payload text,
  insert_time timestamp,
  PRIMARY KEY ((itemid), version)
) WITH CLUSTERING ORDER BY (version DESC);
  1. Use case 1: user needs to get the single (1) doc he wants, he knows the item id and version (not necessarily the latest)

    SELECT * FROM document_versions 
      WHERE itemid = ? and version = ?;
    
  2. Use case 2: user needs to get the single (1) doc he wants, he knows the item id but does not know the latest version

    SELECT * FROM document_versions
      WHERE itemid = ? limit 1;
    
  3. Use case 3: user needs the version history of a single (1) doc.

    SELECT * FROM document_versions 
      WHERE itemid = ?
    
  4. Use case 4: user needs to get the list (1 or more) of docs he wants, he knows the item id AND version (not necessarily the latest)

    SELECT * FROM documents 
      WHERE itemid = 'doc1' and version IN ('1', '2');
    

    One table for all these queries is the correct approach. I would suggest taking the Datastax free online course: DS220 Data Modeling