Data Warehouse: How can i query daily snapshots?
Let's think out of the box. Instead of having a "snapshot", let's have a "log". What you currently have is "current" state of things; adding a "log" would provide the "history", from which could be derived the 'lost' info.
One way to implement the log is to have a TRIGGER
on INSERT
or UPDATE
of the table, and have the trigger write to the log file. This log will not be pleasant for the ad hoc queries, so have a nightly job (or maybe hourly) that summarizes the changes for the day -- net gain (or loss) of number of posts, etc. The "day2" info and the "last month" info can then be derived from this summary table quite quickly. Or perhaps a second level of summarization that declares what the state was for each day. I doubt if UNION
would be needed. The "snapshot" would not be involved.
So what I was looking for, is a new type of system that is related to Datawarehousing: Data Lake System.
You can learn more on Wikipedia:
A data lake is a method of storing data within a system that facilitates the colocation of data in variant schemas and structural forms, usually object blobs or files. Hadoop and the AWS S3 platform can be used to build data lake repositories.