How would I track all price changes in a db in order to get the price of 'x' product on 'y' date
If I understand the scenario appropriately, you should define a table that retains a Price time series; therefore, I agree, this has a lot to do with the temporal aspect of the database you are working with.
Business rules
Let us start analyzing the situation from the conceptual level. So, if, in your business domain,
- a Product is purchased at one-to-many Prices,
- each Price of purchase becomes Current at an exact StartDate, and
- the Price EndDate (which indicates the Date when a Price ceases to be Current) is equal to the StartDate of the immediately subsequent Price,
then that means that
- there are no Gaps between the distinct Periods during which the Prices are Current (the time series is continuous or conjunct), and
- the EndDate of a Price is a derivable datum.
The IDEF1X diagram shown in Figure 1, although highly simplified, depicts such a scenario:
Expository logical layout
And the following SQL-DDL logical-level design, based on said IDEF1X diagram, illustrates a feasible approach that you can adapt to your own exact needs:
-- At the physical level, you should define a convenient
-- indexing strategy based on the data manipulation tendencies
-- so that you can supply an optimal execution speed of the
-- queries declared at the logical level; thus, some testing
-- sessions with considerable data load should be carried out.
CREATE TABLE Product (
ProductNumber INT NOT NULL,
Etcetera CHAR(30) NOT NULL,
--
CONSTRAINT Product_PK PRIMARY KEY (ProductNumber)
);
CREATE TABLE Price (
ProductNumber INT NOT NULL,
StartDate DATE NOT NULL,
Amount INT NOT NULL, -- Retains the amount in cents, but there are other options regarding the type of use.
--
CONSTRAINT Price_PK PRIMARY KEY (ProductNumber, StartDate),
CONSTRAINT Price_to_Product_FK FOREIGN KEY (ProductNumber)
REFERENCES Product (ProductNumber),
CONSTRAINT AmountIsValid_CK CHECK (Amount >= 0)
);
The Price
table has a composite PRIMARY KEY made up of two columns, i.e., ProductNumber
(constrained, in turn, as a FOREIGN KEY that makes a reference to Product.ProductNumber
) and StartDate
(pointing out the particular Date in which a certain Product was purchased at a specific Price).
In case that Products are purchased at different Prices during the same Day, instead of the StartDate
column, you may include one labeled as StartDateTime
that keeps the Instant when a given Product was purchased at an exact Price. The PRIMARY KEY would then have to be declared as (ProductNumber, StartDateTime)
.
As demonstrated, the aforementioned table is an ordinary one, because you can declare SELECT, INSERT, UPDATE and DELETE operations to manipulate its data directly, hence it (a) permits avoiding the installation of additional components and (b) can be used in all the major SQL platforms with some few adjustments, if necessitated.
Data manipulation samples
To exemplify some manipulation operations that appear useful, let us say that you have INSERTed the following data in the Product
and Price
tables, respectively:
INSERT INTO Product
(ProductNumber, Etcetera)
VALUES
(1750, 'Price time series sample');
INSERT INTO Price
(ProductNumber, StartDate, Amount)
VALUES
(1750, '20170601', 1000),
(1750, '20170603', 3000),
(1750, '20170605', 4000),
(1750, '20170607', 3000);
Since the Price.EndDate
is a derivable data point, then you must to obtain it via, precisely, a derived table that can be created as a view in order to produce the “full” time series, as exemplified below:
CREATE VIEW PriceWithEndDate AS
SELECT P.ProductNumber,
P.Etcetera AS ProductEtcetera,
PR.Amount AS PriceAmount,
PR.StartDate,
(
SELECT MIN(StartDate)
FROM Price InnerPR
WHERE P.ProductNumber = InnerPR.ProductNumber
AND InnerPR.StartDate > PR.StartDate
) AS EndDate
FROM Product P
JOIN Price PR
ON P.ProductNumber = PR.ProductNumber;
Then the following operation that SELECTs directly from that view
SELECT ProductNumber,
ProductEtcetera,
PriceAmount,
StartDate,
EndDate
FROM PriceWithEndDate
ORDER BY StartDate DESC;
supplies the next result set:
ProductNumber ProductEtcetera PriceAmount StartDate EndDate
------------- ------------------ ----------- ---------- ----------
1750 Price time series… 4000 2017-06-07 NULL -- (*)
1750 Price time series… 3000 2017-06-05 2017-06-07
1750 Price time series… 2000 2017-06-03 2017-06-05
1750 Price time series… 1000 2017-06-01 2017-06-03
-- (*) A ‘sentinel’ value would be useful to avoid the NULL marks.
Now, let us assume that you are interested in getting the whole Price
data for the Product
primarily identified by ProductNumber
1750 on Date
2 June 2017. Seeing that a Price
assertion (or row) is current or effective during the entire Interval that runs from (i) its StartDate
to (ii) its EndDate
, then this DML operation
SELECT ProductNumber,
ProductEtcetera,
PriceAmount,
StartDate,
EndDate
FROM PriceWithEndDate
WHERE ProductNumber = 1750 -- (1)
AND StartDate <= '20170602' -- (2)
AND EndDate >= '20170602'; -- (3)
-- (1), (2) and (3): You can supply parameters in place of fixed values to make the query more versatile.
yields the result set that follows
ProductNumber ProductEtcetera PriceAmount StartDate EndDate
------------- ------------------ ----------- ---------- ----------
1750 Price time series… 1000 2017-06-01 2017-06-03
which addresses said requirement.
As shown, the PriceWithEndDate
view plays a paramount role in obtaining most of the derivable data, and can be SELECTed FROM in a fairly ordinary way.
Taking into account that your platform of preference is PostgreSQL, this content from the official documentation site contains information about “materialized” views, which can help to optimize the execution speed by means of physical level mechanisms, if said aspect becomes problematic. Other SQL database management systems (DBMSs) offer physical instruments that are very alike, although different terminology may be applied, e.g., “indexed” views in Microsoft SQL Server.
You can see the discussed DDL and DML code samples in action in this db<>fiddle and in this SQL Fiddle.
Related resources
In this Q & A we discuss a business context that includes the changes of Product Prices but has a more extense scope, so you may find it of interest.
These Stack Overflow posts cover very relevant points regarding the type of a column that holds a currency datum in PostgreSQL.
Responses to comments
This looks similar to work that I did, but I found it much more convenient / efficient to work with a table where a price (in this case) has a startdate column and an enddate column - so you're just looking for rows with targetdate >= startdate and targetdate <= enddate. Of course, if the data isn't stored with those fields (including enddate 31st December 9999, not Null, where no actual end date exists), then you'd have to do work to produce it. I actually made it run every day, with end date = today's date by default. Also, my description requires enddate 1 = startdate 2 minus 1 day. – @Robert Carnegie, on 2017-06-22 20:56:01Z
The method I propose above addresses a business domain of the characteristics previously described, consequently applying your suggestion about declaring the EndDate
as a column —which is different from a “field”— of the base table named Price
would imply that the logical structure of the database would not be reflecting the conceptual schema correctly, and a conceptual schema must be defined and reflected with precision, including the differentiation of (1) base information from (2) derivable information.
Apart from that, such a course of action would introduce duplication, since the EndDate
could then be obtained by virtue of (a) a derivable table and also by virtue of (b) the base table named Price
, with the therefore duplicated EndDate
column. While that is a possibility, if a practitioner decides to follow said approach, he or she should decidedly warn the database users about the inconveniences and inefficiencies it involves. One of those inconveniences and inefficiencies is, e.g., the urgent need to develop a mechanism that ensures, at all times, that each Price.EndDate
value is equal to that of the Price.StartDate
column of the immediately successive row for the Price.ProductNumber
value at hand.
In contrast, the work to produce the derived data in question as I put forward is, honestly, not special at all, and is required to (i) guarantee the correct correspondence between the logical and conceptual levels of abstraction of the database and (ii) ensure data integrity, both aspects that as noted before are decidedly of great importance.
If the efficiency aspect you are talking about is related to the execution speed of some data manipulation operations, then it must be managed at the appropriate place, i.e., at the physical level, via, e.g., an advantageous indexing strategy, based on (1) the particular query tendencies and (2) the particular physical mechanisms provided by the DBMS of use. Otherwise, sacrificing the appropriate conceptual-logical mapping and compromising the integrity of the data involved easily turns a robust system (i.e., a valuable organizational asset) into a non-reliable resource.
Discontinuous or disjunct time series
On the other hand, there are circumstances where retaining the EndDate
of each row in a time series table is not only more conenient and efficient but demanded, although that depends entirely on business-environment-specific requirements of course. One example of that kind of circumstances comes about when
- both the StartDate and the EndDate pieces of information are held before (and retained via) every INSERTion, and
- there can be Gaps in the middle of the distinct Periods during which the Prices are Current (i.e., the time series is discontinuous or disjunct).
I have represented said scenario in the IDEF1X diagram displayed in Figure 2.
In that case, yes, the hypothetical Price
table must be declared in a manner that is similar to this:
CREATE TABLE Price (
ProductNumber INT NOT NULL,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL,
Amount INT NOT NULL,
--
CONSTRAINT Price_PK PRIMARY KEY (ProductNumber, StartDate, EndDate),
CONSTRAINT Price_to_Product_FK FOREIGN KEY (ProductNumber)
REFERENCES Product (ProductNumber),
CONSTRAINT DatesOrder_CK CHECK (EndDate >= StartDate)
);
And, yes, that logical DDL design simplifies administration at the physical level, because you can put up an indexing strategy that encompasses the EndDate
column (which, as shown, is declared in a base table) in relatively easier configurations.
Then, a SELECT operation like the one below
SELECT P.ProductNumber,
P.Etcetera,
PR.Amount,
PR.StartDate,
PR.EndDate
FROM Price PR
JOIN Product P
WHERE P.ProductNumber = 1750
AND StartDate <= '20170602'
AND EndDate >= '20170602';
may be used to derive the whole Price
data for the Product
primarily identified by ProductNumber
1750 on Date
2 June 2017.
I believe you're going to want to look at Temporal Tables. These provide functionality to do exactly what you're looking for and are available in Postgres with the proper extensions.
This concept looks to be pretty DB agnostic as well, as it's offered on a variety of RDBMS platforms.