Select rows for which at least one row per set meets a condition

Basically you are looking for the expression:

client_status IS DISTINCT FROM 'n'

The column client_status should really be data type boolean, not text, which would allow the simpler expression:

client_status IS NOT FALSE

The manual has details in the chapter Comparison Operators.

Assuming your actual table has a UNIQUE or PK constraint, we arrive at:

CREATE TABLE test (
  company_id    integer NOT NULL, 
  client_id     integer NOT NULL, 
  client_status boolean,
  PRIMARY KEY (company_id, client_id)
);

Queries

All of these do the same (what you asked), which is the fastest depends on data distribution:

SELECT company_id, client_id
FROM   test t
WHERE  EXISTS (
   SELECT FROM test
   WHERE  company_id = t.company_id
   AND    client_status IS NOT FALSE
   );

Or:

SELECT company_id, client_id
FROM   test t
JOIN  (
   SELECT company_id
   FROM   test t
   GROUP  BY 1
   HAVING bool_or(client_status IS NOT FALSE)
   ) c USING (company_id);

Or:

SELECT company_id, client_id
FROM   test t
JOIN  (
   SELECT DISTINCT company_id, client_status 
   FROM   test t
   ORDER  BY company_id, client_status DESC
   ) c USING (company_id)
WHERE  c.client_status IS NOT FALSE;

Boolean values sort FALSE -> TRUE -> NULL in ascending sort order. So FALSE comes last in descending order. If there's is any other value available, then that one's picked first ...

Sorting null values after all others, except special

The added PK is implemented with a useful index for these queries. If you want faster, yet, add a partial index for query 1:

CREATE INDEX test_special_idx ON test (company_id, client_id)
WHERE  client_status IS NOT FALSE;

You could use window functions, too, but that would be slower. Example with first_value():

SELECT company_id, client_id
FROM  (
   SELECT company_id, client_id
        , first_value(client_status) OVER (PARTITION BY company_id
                                           ORDER BY client_status DESC) AS stat
   FROM   test t
   ) sub
WHERE  stat IS NOT FALSE;

For lots of rows per company_id, one of these techniques may be faster, still:

Optimize GROUP BY query to retrieve latest record per user

I think this can be simplified a bit:

select company_id 
from test 
group by company_id 
having count(*) filter (where client_status!='n' or client_status is null) > 0;

I may have misunderstood you but I imagine something like:

 select * 
 from test x 
 where exists ( 
     select 1 
     from test y 
     where x.company_id = y.company_id 
       and coalesce(client_status, 'y') <> 'n'
 );

will work. coalesce is use to map null to 'y', but anything different than 'n' should do

Using an OLAP function can save us a "join":

select company_id, client_id 
from (
    select x.*
         , count(nullif(coalesce(client_status,'y'),'n')) 
               over (partition by company_id) as cnt 
    from test x
) 
where cnt > 0;

Here we map null -> 'y' and 'n' -> null. Since count(x) will count rows where x is not null, we count rows where client_status <> 'n'. I used an OLAP function to avoid GROUP BY, which means that we only need to reference the table once.

Select rows for which at least one row per set meets a condition

Queries

Tags:

Postgresql

Postgresql 9.4

Related

Recent Posts