Using S3 as a database vs. database (e.g. MongoDB)
You are "considering using AWS S3 bucket instead of a NoSQL database", but the fact is that Amazon S3 effectively is a NoSQL database.
It is a very large Key-Value store. The Key is the filename, the Value is the contents of the file.
If your needs are simply "Store a value with this key" and "Retrieve a value with this key", then it would work just fine!
In fact, old orders on Amazon.com (more than a year old) are apparently archived to Amazon S3 since they are read-only (no returns, no changes).
While slower than DynamoDB, Amazon S3 certainly costs significantly less for storage!
Context: we use S3 for some "database" (lit. key/value structured storage).
It should be noted that S3 does actually have search and, depending on how you structure your data, queries in the form of S3 Select (and, if you have the time: Athena).
Edit: prior to December, 2020, S3 was eventually consistent. Now it it is strongly consistent. Following disadvantages doesn't apply anymore, but are here for historical reasons.
Before December, 2020, the biggest disadvantage/architectural challenge was that S3 was eventually consistent (which was actually the reason why you could not "update" a file). This manifested itself in some behaviours which your architecture needed to tolerate:
- Operations were cached by key, so if you attempted to get an object that doesn't exist, and then create it- for a period of time* any gets on that object will return that it does not exist.
- There was no global cache, so you could get two different versions of the same object for a period of time* after it has been overwritten.
- List operations provided a semi-unstable iterator. If you were going to list on a large number of objects in a bucket that was being updated, then chances are you were not going to visit all the objects by the end of the iterator.
*period of time is purposely undefined by AWS, however, from observation, it is rarely more than a minute.