How to unnest and GROUP BY elements of a JSON array?
The data type of the column people
is json
, as is the result of json_array_elements(people)
. And there is no equality operator (=
) for the data type json
. So you also cannot run GROUP BY
on it. More:
- How to remove known elements from a JSON[] array in PostgreSQL?
jsonb
has an equality operator, so the "workaround" in your answer is to cast to jsonb
and use the equivalent jsonb_array_elements()
. The cast adds cost:
jsonb_array_elements(people::jsonb)
Since Postgres 9.4 we also have json_array_elements_text(json)
returning array elements as text
. Related:
- How to turn JSON array into Postgres array?
So:
SELECT p.name, count(*) AS c
FROM band b, json_array_elements_text(b.people) p(name)
GROUP BY p.name;
It seems more convenient to get names as text
instead of jsonb
objects (double-quoted in text representation) and your "desired output" indicates you want / need text
in the result to begin with.
GROUP BY
on text
data is also cheaper than on jsonb
, so this alternative "workaround" should be faster for two reasons. (Test with EXPLAIN (ANALYZE, TIMING OFF)
.)
For the record, there was nothing wrong with your original answer. The comma (,
) is just as "correct" as CROSS JOIN LATERAL
. Having been defined earlier in standard SQL does no make it inferior. See:
- What does [FROM x, y] mean in Postgres?
Neither is it more portable to other RDBMS, and since jsonb_array_elements()
or json_array_elements_text()
aren't portable to other RDBMS to begin with, that's also irrelevant. The short query does not get any clearer with CROSS JOIN LATERAL
IMO, but the last bit is just my personal opinion.
I used the more explicit table and column alias p(name)
and the table-qualified reference p.name
to defend against possible duplicate names. name
is such a common word, it might also pop up as column name in the underlying table band
, in which case it would silently resolve to band.name
. The simple form json_array_elements_text(people) name
only attaches a table alias, the column name is still value
, as returned from the function. But name
resolves to it's single column value
when used in the SELECT
list. It happens to work as expected. But a true column name name
(if band.name
should exist) would bind first. While that won't bite in the given example, it can be a a loaded foot gun in other cases.
Don't use the generic "name" as identifier to begin with. Maybe that was just for the simple test case.
If the column people
can hold anything but a plain JSON array, either query would trigger an exception. If you cannot guarantee data integrity, you might want to defend with json_typeof()
:
SELECT p.name, count(*) AS c
FROM band b, json_array_elements_text(b.people) p(name)
WHERE json_typeof(b.people) = 'array'
GROUP BY 1; -- optional short syntax since you seem to prefer short syntax
Excludes violating rows from the query.
Related:
- When can we use an identifier number instead of its name in PostgreSQL?
Based on @ypercubeᵀᴹ comment I ended up with :
SELECT name, count(*) as c
FROM band
CROSS JOIN LATERAL jsonb_array_elements(people::jsonb) as name
GROUP BY name;
Just used jsonb_array_elements
instead of unnest
.