Locking an s3 object best practice?
Some, but not all, of the original answer, below, contains information that is no longer entirely applicable to Amazon S3, as of December, 2020.
Effective immediately, all S3
GET
,PUT
, andLIST
operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of aLIST
will be an accurate reflection of what’s in the buckethttps://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/
However, that enhancement doesn't resolve this concern. There is still an important race condition potential, though it is reduced.
Amazon S3 does not support object locking for concurrent writers. If two
PUT
requests are simultaneously made to the same key, the request with the latest timestamp wins. If this is an issue, you must build an object-locking mechanism into your application.https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel
The enhancements to S3 that eliminated eventual consistency do not eliminate the problem of concurrent writers -- so you still need a lock mechanism. Also, as noted in the original question, objects in S3 cannot actually be renamed atomically -- they can only be copied internally, atomically to a new object with a different object key, then the old object deleted, so both can exist for a nonzero length of time.
Of course, since the original answer was posted, SQS released FIFO queues which guarantee exactly-once delivery of messages to a properly written application.
order not guaranteed, possibility of messages delivered more than once, and more than one EC2 getting the same message
The odds of actually getting the same message more than once is low. It's merely "possible," but not very likely. If it's essentially only an annoyance if, on isolated occasions, you should happen to process a file more than once, then SQS seems like an entirely reasonable option.
Otherwise, you'll need an external mechanism.
Setting a "locked" header on the object has a problem of its own -- when you overwrite an object with a copy of itself (that's what happens when you change the metadata -- a new copy of the object is created, with the same key) then you are subject to the slings and arrows of eventual consistency.
Q: What data consistency model does Amazon S3 employ?
Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
https://aws.amazon.com/s3/faqs/
Updating metadata is an "overwrite PUT
." Your new header may not immediately be visible, and if two or more workers set their own unique header (e.g. x-amz-meta-locked: i-12345678) it's entirely possible for a scenario like the following to play out (W1, W2 = Worker #1 and #2):
W1: HEAD object (no lock header seen)
W2: HEAD object (no lock header seen)
W1: set header
W2: set header
W1: HEAD object (sees its own lock header)
W2: HEAD object (sees its own lock header)
The same or a similar failure can occur with several different permutations of timing.
Objects can't be effectively locked in an eventual consistency environment like this.
Object tag can assist here, as changing a tag doesn't create a new copy. Tag is kind of key/value pair associated to object. i.e. you need to use object level tagging.