Use google bigquery to build histogram graph

You could also use the quantiles aggregation operator to get a quick look at the distribution of ages.

SELECT
  quantiles(age, 10)
FROM mytable

Each row of this query would correspond to the age at that point in the list of ages. The first result is the age 1/10ths of the way through the sorted list of ages, the second is the age 2/10ths through, 3/10ths, etc.

See the 2019 update, with #standardSQL --Fh

The subquery idea works, as does "CASE WHEN" and then doing a group by:

SELECT COUNT(field1), bucket 
FROM (
    SELECT field1, CASE WHEN age >=  0 AND age < 10 THEN 1
                        WHEN age >= 10 AND age < 20 THEN 2
                        WHEN age >= 20 AND age < 30 THEN 3
                        ...
                        ELSE -1 END as bucket
    FROM table1) 
GROUP BY bucket

Alternately, if the buckets are regular -- you could just divide and cast to an integer:

SELECT COUNT(field1), bucket 
FROM (
    SELECT field1, INTEGER(age / 10) as bucket FROM table1)
GROUP BY bucket

With #standardSQL and an auxiliary stats query, we can define the range the histogram should look into.

Here for the time to fly between SFO and JFK - with 10 buckets:

WITH data AS ( 
    SELECT *, ActualElapsedTime datapoint
    FROM `fh-bigquery.flights.ontime_201903`
    WHERE FlightDate_year = "2018-01-01" 
    AND Origin = 'SFO' AND Dest = 'JFK'
)
, stats AS (
  SELECT min+step*i min, min+step*(i+1)max
  FROM (
    SELECT max-min diff, min, max, (max-min)/10 step, GENERATE_ARRAY(0, 10, 1) i
    FROM (
      SELECT MIN(datapoint) min, MAX(datapoint) max
      FROM data
    )
  ), UNNEST(i) i
)

SELECT COUNT(*) count, (min+max)/2 avg
FROM data 
JOIN stats
ON data.datapoint >= stats.min AND data.datapoint<stats.max
GROUP BY avg
ORDER BY avg

If you need round numbers, see: https://stackoverflow.com/a/60159876/132438

enter image description here

Use google bigquery to build histogram graph

Tags:

Sql

Google Bigquery

Related

Recent Posts