Get incremental counts of an aggregated value in a joined table
Using functions or expressions in JOIN conditions is usually a bad idea, I say usually because some optimisers can handle it fairly well and utilize indexes anyhow. I would suggest creating a table for the weights. Something like:
CREATE TABLE weights
( weight int not null primary key
);
INSERT INTO weights (weight) VALUES (0),(10),(20),...(1270);
Make sure you have indexes on posts_reasons
:
CREATE UNIQUE INDEX ... ON posts_reasons (reason_id, post_id);
A query like:
SELECT w.weight
, COUNT(1) as post_count
FROM weights w
JOIN ( SELECT pr.post_id, SUM(r.weight) as sum_weight
FROM reasons r
JOIN posts_reasons pr
ON r.id = pr.reason_id
GROUP BY pr.post_id
) as x
ON w.weight > x.sum_weight
GROUP BY w.weight;
My machine at home is probably 5-6 years old, it has an Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz and 8Gb of ram.
uname -a Linux dustbite 4.16.6-302.fc28.x86_64 #1 SMP Wed May 2 00:07:06 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I tested against:
https://drive.google.com/open?id=1q3HZXW_qIZ01gU-Krms7qMJW3GCsOUP5
MariaDB [test3]> select @@version;
+-----------------+
| @@version |
+-----------------+
| 10.2.14-MariaDB |
+-----------------+
1 row in set (0.00 sec)
SELECT w.weight
, COUNT(1) as post_count
FROM weights w
JOIN ( SELECT pr.post_id, SUM(r.weight) as sum_weight
FROM reasons r
JOIN posts_reasons pr
ON r.id = pr.reason_id
GROUP BY pr.post_id
) as x
ON w.weight > x.sum_weight
GROUP BY w.weight;
+--------+------------+
| weight | post_count |
+--------+------------+
| 0 | 1 |
| 10 | 2591 |
| 20 | 4264 |
| 30 | 4386 |
| 40 | 5415 |
| 50 | 7499 |
[...]
| 1270 | 119283 |
| 1320 | 119286 |
| 1330 | 119286 |
[...]
| 2590 | 119286 |
+--------+------------+
256 rows in set (9.89 sec)
If performance is critical and nothing else helps you could create a summary table for:
SELECT pr.post_id, SUM(r.weight) as sum_weight
FROM reasons r
JOIN posts_reasons pr
ON r.id = pr.reason_id
GROUP BY pr.post_id
You can maintain this table via triggers
Since there is a certain amount of work that needs to be done for each weight in weights, it may be beneficial to limit this table.
ON w.weight > x.sum_weight
WHERE w.weight <= (select MAX(sum_weights)
from (SELECT SUM(weight) as sum_weights
FROM reasons r
JOIN posts_reasons pr
ON r.id = pr.reason_id
GROUP BY pr.post_id) a
)
GROUP BY w.weight
Since I had a lot of unnecesary rows in my weights table (max 2590), the restriction above cut the execution time from 9 down to 4 seconds.
In MySQL, variables can be used in queries both to be calculated from values in columns and to be used in expression for new, calculated columns. In this case, using a variable results in an efficient query:
SELECT
weight,
@cumulative := @cumulative + post_count AS post_count
FROM
(SELECT @cumulative := 0) AS x,
(
SELECT
FLOOR(reason_weight / 10) * 10 AS weight,
COUNT(*) AS post_count
FROM
(
SELECT
p.id,
SUM(r.weight) AS reason_weight
FROM
posts AS p
INNER JOIN posts_reasons AS pr ON p.id = pr.post_id
INNER JOIN reasons AS r ON pr.reason_id = r.id
GROUP BY
p.id
) AS d
GROUP BY
FLOOR(reason_weight / 10)
ORDER BY
FLOOR(reason_weight / 10) ASC
) AS derived
;
The d
derived table is actually your post_weights
view. Therefore, if you are planning on keeping the view, you can use it instead of the derived table:
SELECT
weight,
@cumulative := @cumulative + post_count AS post_count
FROM
(SELECT @cumulative := 0),
(
SELECT
FLOOR(reason_weight / 10) * 10 AS weight,
COUNT(*) AS post_count
FROM
post_weights
GROUP BY
FLOOR(reason_weight / 10)
ORDER BY
FLOOR(reason_weight / 10) ASC
) AS derived
;
A demo of this solution, which uses a concise edition of the reduced version of your setup, can be found and played with at SQL Fiddle.