Calculating stock quantity based on change log
Sometimes you can improve query performance just by doing a little bit of tuning instead of changing your entire query. I noticed in your actual query plan that your query spills to tempdb in three places. Here's one example:
Resolving those tempdb spills may improve performance. If Quantity
is always non-negative then you can replace UNION
with UNION ALL
which will likely change the hash union operator to something else that doesn't require a memory grant. Your other tempdb spills are caused by issues with cardinality estimation. You're on SQL Server 2014 and using the new CE so it may be difficult to improve the cardinality estimates because the query optimizer won't use multi-column statistics. As a quick fix, consider using the MIN_MEMORY_GRANT
query hint made available in SQL Server 2014 SP2. Your query's memory grant is only 49104 KB and the max available grant is 5054840 KB so hopefully bumping it up won't impact concurrency too much. 10% is a reasonable starting guess but you may need to adjust it up and done depending on your hardware and data. Putting that all together, this is what your query might look like:
WITH t AS
(
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY ToPositionId, ProductId
UNION ALL
SELECT FromPositionId AS PositionId, -SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY FromPositionId, ProductId
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
WHERE NOT t.PositionId = 0
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0
OPTION (MIN_GRANT_PERCENT = 10);
If you wish to improve performance further I recommend trying out indexed views instead of building and maintaining your own checkpoint table. Indexed views are significantly easier to get right than a custom solution involving your own materialized table or triggers. They will add a small amount of overhead to all DML operations but it may allow you to remove some of the nonclustered indexes that you currently have. Indexed views appear to be supported in the web edition of the product.
There are some restrictions on indexed views so you'll need to create a pair of them. Below is an example implementation, along with the fake data that I used for testing:
CREATE TABLE dbo.ProductPositionLog (
LogId BIGINT NOT NULL,
ProductId BIGINT NOT NULL,
FromPositionId BIGINT NOT NULL,
ToPositionId BIGINT NOT NULL,
Quantity INT NOT NULL,
FILLER VARCHAR(20),
PRIMARY KEY (LogId)
);
INSERT INTO dbo.ProductPositionLog WITH (TABLOCK)
SELECT RN, RN % 100, RN % 3999, 3998 - (RN % 3999), RN % 10, REPLICATE('Z', 20)
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) q;
CREATE INDEX NCI1 ON dbo.ProductPositionLog (ToPositionId, ProductId) INCLUDE (Quantity);
CREATE INDEX NCI2 ON dbo.ProductPositionLog (FromPositionId, ProductId) INCLUDE (Quantity);
GO
CREATE VIEW ProductPositionLog_1
WITH SCHEMABINDING
AS
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
FROM dbo.ProductPositionLog
WHERE ToPositionId <> 0
GROUP BY ToPositionId, ProductId
GO
CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON ProductPositionLog_1 (PositionId, ProductId);
GO
CREATE VIEW ProductPositionLog_2
WITH SCHEMABINDING
AS
SELECT FromPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
FROM dbo.ProductPositionLog
WHERE FromPositionId <> 0
GROUP BY FromPositionId, ProductId
GO
CREATE UNIQUE CLUSTERED INDEX IDX_V2
ON ProductPositionLog_2 (PositionId, ProductId);
GO
Without the indexed views the query takes about 2.7 seconds to finish on my machine. I get a similar plan to yours except mine runs in serial:
I believe that you'll need to query the indexed views with the NOEXPAND
hint because you aren't on enterprise edition. Here's one way to do that:
WITH t AS
(
SELECT PositionId, Quantity, ProductId
FROM ProductPositionLog_1 WITH (NOEXPAND)
UNION ALL
SELECT PositionId, Quantity, ProductId
FROM ProductPositionLog_2 WITH (NOEXPAND)
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0;
This query has a simpler plan and finishes in under 400 ms on my machine:
The best part is that you won't have to change any of the application code that loads data into the ProductPositionLog
table. You simply need to verify that the DML overhead of the pair of indexed views is acceptable.
I don't really think your current approach is all that inefficient. Seems like a pretty straightforward way to do it. Another approach might be to use an UNPIVOT
clause, but I'm not sure it would be a performance improvement. I implemented both approaches with the below code (just over 5 million rows), and each returned in about 2 seconds on my laptop, so I'm not sure what is so different about my data set compared to the real one. I didn't even add any indexes (other than a primary key on LogId
).
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ProductPositionLog]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[ProductPositionLog] (
[LogId] int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
[ProductId] int NULL,
[FromPositionId] int NULL,
[ToPositionId] int NULL,
[Date] datetime NULL,
[Quantity] int NULL
)
END;
GO
SET IDENTITY_INSERT [ProductPositionLog] ON
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (1, 123, 0, 1, '2018-01-01 08:10:22', 5)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (2, 123, 0, 2, '2018-01-03 15:15:10', 9)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (3, 123, 1, 3, '2018-01-07 21:08:56', 3)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (4, 123, 3, 0, '2018-02-09 10:03:23', 2)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (5, 123, 2, 3, '2018-02-09 10:03:23', 4)
SET IDENTITY_INSERT [ProductPositionLog] OFF
GO
INSERT INTO ProductPositionLog
SELECT ProductId + 1,
FromPositionId + CASE WHEN FromPositionId = 0 THEN 0 ELSE 1 END,
ToPositionId + CASE WHEN ToPositionId = 0 THEN 0 ELSE 1 END,
[Date], Quantity
FROM ProductPositionLog
GO 20
-- Henrik's original solution.
WITH t AS
(
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY ToPositionId, ProductId
UNION
SELECT FromPositionId AS PositionId, -SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY FromPositionId, ProductId
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
WHERE NOT t.PositionId = 0
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0
GO
-- Same results via unpivot
SELECT ProductId, PositionId,
SUM(CAST(TransferType AS INT) * Quantity) AS Quantity
FROM
(SELECT ProductId, Quantity, FromPositionId AS [-1], ToPositionId AS [1]
FROM ProductPositionLog) p
UNPIVOT
(PositionId FOR TransferType IN
([-1], [1])
) AS unpvt
WHERE PositionId <> 0
GROUP BY ProductId, PositionId
As far as the checkpoints go, it seems like a reasonable idea to me. Since you say that the updates and deletes are really infrequent, I would just add a trigger on ProductPositionLog
that fires on update and delete and that adjusts the checkpoint table appropriately. And just to be extra sure, I would recalculate the checkpoint and cache tables from scratch occasionally.