Does "group by" automatically guarantee "order by"?
group by
does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.
So add the order by
if you need a guaranteed order.
It definitely doesn't. I have experienced that, once one of my queries suddenly started to return not-ordered results, as the data in the table grows by.
It depends on the database vendor.
For example PostgreSQL does not automatically sort the grouped result. Here you have to use order by to get the data sorted.
But Sybase and Microsoft SQL Server do. Here you can use order by to change the default sorting.
An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb
An example using PostgreSQL proving that concept
Creating a table with 1M records, with random dates in a day range from today - 90 and indexing by date
CREATE TABLE WITHDRAW AS
SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
md5(random()::text) AS NAM_PERSON,
(NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
(random() * 1000)::decimal(12, 2) AS NUM_VALUE
FROM generate_series(1,1000000);
CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
Grouping by date truncated by day of month, restricting select by dates in a two days range
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
HashAggregate (cost=11428.33..11594.13 rows=11053 width=48)
Group Key: date_trunc('DAY'::text, dat_creation)
-> Bitmap Heap Scan on withdraw w (cost=237.73..11345.44 rows=11053 width=14)
Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
-> Bitmap Index Scan on withdraw_dat_creation (cost=0.00..234.97 rows=11053 width=0)
Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
Using a larger restriction date range, it chooses to apply a SORT
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
GroupAggregate (cost=116522.65..132918.32 rows=655827 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.65..118162.22 rows=655827 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.57 rows=655827 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
Just by adding ORDER BY 1
at the end (there is no significant difference)
GroupAggregate (cost=116522.44..132918.06 rows=655825 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.44..118162.00 rows=655825 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.56 rows=655825 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
PostgreSQL 10.3