RANK used in 'where' returns invalid column, but exists in results set
You cannot use aliases in WHERE
clauses, nor can you use Window functions.
To demonstrate, consider this example (fiddle).
CREATE TABLE payment (amount INTEGER, pay_date DATE);
INSERT INTO payment VALUES (54, '2019-09-01'), (56, '2019-09-01'), (154, '2019-09-02'),
(156, '2019-09-02'), (254, '2019-09-03'), (256, '2019-09-03');
Then run the two following queries:
SELECT
*,
SUM(amount) OVER (PARTITION BY pay_date) AS the_sums
FROM payment
WHERE the_sums > 200; -- ERROR: column "the_sums" does not exist
and
SELECT
*,
SUM(amount) OVER (PARTITION BY pay_date) AS the_sums
FROM payment
WHERE SUM(amount) OVER (PARTITION BY pay_date) > 200;
-- ERROR: window functions are not allowed in WHERE
You didn't post your error message or PostgreSQL version, but your problem is that timestamp_d1_rank
and timestamp_d2_rank
are aliases.
You have two options:
- First Option: use the aliases/Window functions within a subquery (also in fiddle)
SELECT * FROM
(
SELECT
*,
SUM(amount) OVER (PARTITION BY pay_date) AS the_sums
FROM payment
) AS tab
WHERE the_sums < 200;
Result:
amount pay_date the_sums
54 2019-09-01 110
56 2019-09-01 110
- Second Option: you can use a
CTE
(aka theWITH
clause as outlined here or as in @ypercubeᵀᴹ 's answer above or also see the fiddle.
For a good explanation of this whole area, see here.
You cannot use window/ranking functions or their aliases in the WHERE
clause because they are evaluated in SELECT
, after the WHERE
has been evaluated.
You can use a subquery (derived table or CTE) to process a second WHERE
clause, after the window functions have been evaluated:
SELECT d1, d2,
"DID",
t1, t2,
coords_centroid_a,
coords_centroid_b
FROM
(
SELECT a.timestamp_intersecting_date d1, b.timestamp_intersecting_date d2,
a."DID",
a.timestamp_intersecting_max t1, b.timestamp_intersecting_min t2,
RANK () OVER (
PARTITION BY a.timestamp_intersecting_date
ORDER BY a.timestamp_intersecting_max DESC
) timestamp_d1_rank ,
RANK () OVER (
PARTITION BY b.timestamp_intersecting_date
ORDER BY b.timestamp_intersecting_max ASC
) timestamp_d2_rank,
a.coords_centroid AS coords_centroid_a,
b.coords_centroid AS coords_centroid_b
FROM
signals a
INNER JOIN signals b ON (a."DID" = b."DID")
WHERE (b.timestamp_intersecting_date = a.timestamp_intersecting_date + INTERVAL '1 DAY')
AND a."DID" = b."DID"
) AS t
WHERE t.timestamp_d1_rank = 1
AND t.timestamp_d2_rank = 1
ORDER BY "DID", t1 DESC, t2 ASC ;
The confusion with syntax rules concerning input and output column names has been addressed properly by existing answers. This is about a lurking performance problem.
There is an transient CROSS JOIN
in your query that scales terribly with growing number of rows per ("DID", date_col)
. Example: If a "DID"
has 100 coordinates per day, the query ends up processing 10.000 (!) combinations with the next day.
Based on this assumed table definition:
CREATE TABLE signals (
"DID" serial PRIMARY KEY
, date_col date NOT NULL -- ?
, time_col time NOT NULL -- ?
, coords_centroid geography -- ?
);
And assuming there are only few rows per ("DID", date_col)
, this query should be much faster already:
SELECT a."DID", a.date_col, t1, coords1, t2, coords2
FROM ( -- first row per ("DID", date_col)
SELECT DISTINCT ON ("DID", date_col)
"DID", date_col, time_col AS t2, coords_centroid AS coords2
FROM signals
ORDER BY "DID", date_col, time_col
) b
CROSS JOIN LATERAL ( -- corresponding last row from previous day
SELECT "DID", date_col, time_col AS t1, coords_centroid AS coords1
FROM signals a
WHERE a."DID" = b."DID"
AND a.date_col = b.date_col - 1
ORDER BY time_col DESC
) a;
Needs an index on ("DID", date_col, time_col)
.
More might be done here, depending on your actual setup and actual data distribution. I'll elaborate if you are interested and provide missing information ...