Cassandra preventing duplicates
Let me understand -- if the couple (userId, placeId)
should be unique, (meaning that you don't have to put two rows with this pair of data) what is the timeVisit
useful for in the primary key? Why would you perform a query using order by visitTime desc
if this will have only one row?
If what you need is to prevent duplication you have 2 ways.
1 - Lightweight transaction -- this, using IF NOT EXISTS
will do what you want. But as I explained here lightweight transactions are really slow due to a particular handling by cassandra
2 - USING TIMESTAMP
Writetime enforcement - (be careful with it!***) The 'trick' is to force a decreasing TIMESTAMP
Let me give an example:
INSERT INTO users (uid, placeid , visittime , otherstuffs ) VALUES ( 1, 2, 1000, 'PLEASE DO NOT OVERWRITE ME') using TIMESTAMP 100;
This produces this output
select * from users;
uid | placeid | otherstuffs | visittime
-----+---------+----------------------------+-----------
1 | 2 | PLEASE DO NOT OVERWRITE ME | 1000
Let's now decrease the timestamp
INSERT INTO users (uid, placeid , visittime , otherstuffs ) VALUES ( 1, 2, 2000, 'I WANT OVERWRITE YOU') using TIMESTAMP 90;
Now data in the table have not been updated, since there is a higher TS operation (100) for the couple (uid, placeid)
-- in fact here the output has not changed
select * from users;
uid | placeid | otherstuffs | visittime
-----+---------+----------------------------+-----------
1 | 2 | PLEASE DO NOT OVERWRITE ME | 1000
If performance matters then use solution 2, if performance doesn't matter then use solution 1. For solution 2 you could calculate a decreasing timestamp for each write using a fixed number minus the system time millis
eg:
Long decreasingTimestamp = 2_000_000_000_000L - System.currentTimeMillis();
*** this solution might lead to unexpected behaviour if, for instance, you want delete and then reinsert data. It is important to know that once you delete data you will be able to write them again only if the write operation will have a higher timestamp of the deletion one (if not specified, the timestamp used is the one of the machine)
HTH,
Carlo