Select rows for which at least one row per set meets a condition
Basically you are looking for the expression:
client_status IS DISTINCT FROM 'n'
The column client_status
should really be data type boolean
, not text
, which would allow the simpler expression:
client_status IS NOT FALSE
The manual has details in the chapter Comparison Operators.
Assuming your actual table has a UNIQUE
or PK
constraint, we arrive at:
CREATE TABLE test (
company_id integer NOT NULL,
client_id integer NOT NULL,
client_status boolean,
PRIMARY KEY (company_id, client_id)
);
Queries
All of these do the same (what you asked), which is the fastest depends on data distribution:
SELECT company_id, client_id
FROM test t
WHERE EXISTS (
SELECT FROM test
WHERE company_id = t.company_id
AND client_status IS NOT FALSE
);
Or:
SELECT company_id, client_id
FROM test t
JOIN (
SELECT company_id
FROM test t
GROUP BY 1
HAVING bool_or(client_status IS NOT FALSE)
) c USING (company_id);
Or:
SELECT company_id, client_id
FROM test t
JOIN (
SELECT DISTINCT company_id, client_status
FROM test t
ORDER BY company_id, client_status DESC
) c USING (company_id)
WHERE c.client_status IS NOT FALSE;
Boolean values sort FALSE
-> TRUE
-> NULL
in ascending sort order. So FALSE
comes last in descending order. If there's is any other value available, then that one's picked first ...
- Sorting null values after all others, except special
The added PK is implemented with a useful index for these queries. If you want faster, yet, add a partial index for query 1:
CREATE INDEX test_special_idx ON test (company_id, client_id)
WHERE client_status IS NOT FALSE;
You could use window functions, too, but that would be slower. Example with first_value()
:
SELECT company_id, client_id
FROM (
SELECT company_id, client_id
, first_value(client_status) OVER (PARTITION BY company_id
ORDER BY client_status DESC) AS stat
FROM test t
) sub
WHERE stat IS NOT FALSE;
For lots of rows per company_id
, one of these techniques may be faster, still:
- Optimize GROUP BY query to retrieve latest record per user
I think this can be simplified a bit:
select company_id
from test
group by company_id
having count(*) filter (where client_status!='n' or client_status is null) > 0;
I may have misunderstood you but I imagine something like:
select *
from test x
where exists (
select 1
from test y
where x.company_id = y.company_id
and coalesce(client_status, 'y') <> 'n'
);
will work. coalesce is use to map null to 'y', but anything different than 'n' should do
Using an OLAP function can save us a "join":
select company_id, client_id
from (
select x.*
, count(nullif(coalesce(client_status,'y'),'n'))
over (partition by company_id) as cnt
from test x
)
where cnt > 0;
Here we map null -> 'y' and 'n' -> null. Since count(x) will count rows where x is not null, we count rows where client_status <> 'n'. I used an OLAP function to avoid GROUP BY, which means that we only need to reference the table once.