Clickhouse as time-series storage

There are more than one ways to use CH as a time series database. My personal preference is to use one string array for metric names and one Float64 array for metric values.

This is a sample time series table:

CREATE TABLE ts1(
    entity String,
    ts UInt64, -- timestamp, milliseconds from January 1 1970
    m Array(String), -- names of the metrics
    v Array(Float32), -- values of the metrics
    d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column
    dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column
) ENGINE = MergeTree(d, entity, 8192)

Here we are loading two metrics (load, temperature) for an entity(cpu):

INSERT INTO ts1(entity, ts, m, v) 
VALUES ('cpu', 1509232010254, ['load','temp'], [0.85, 68])

And querying cpu load:

SELECT 
    entity, 
    dt, 
    ts, 
    v[indexOf(m, 'load')] AS load
FROM ts1 
WHERE entity = 'cpu'

┌─entity─┬──────────────────dt─┬────────────ts─┬─load─┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ 0.85 │
└────────┴─────────────────────┴───────────────┴──────┘

Get data as array of tuples:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayMap((mm, vv) -> (mm, vv), m, v) AS metrics
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics─────────────────────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ [('load',0.85),('temp',68)] │
└────────┴─────────────────────┴───────────────┴─────────────────────────────┘

Get data as rows of tuples:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68)   │
└────────┴─────────────────────┴───────────────┴───────────────┘

Get rows with the metric you want:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metrics
FROM ts1 
WHERE metrics.1 = 'load'

┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics───────┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
└────────┴─────────────────────┴───────────────┴───────────────┘

Get metric names and values as columns:

SELECT 
    entity, 
    dt, 
    ts, 
    arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric, 
    metric.1 AS metric_name, 
    metric.2 AS metric_value
FROM ts1 

┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┬─metric_name─┬─metric_value─┐
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ load        │         0.85 │
│ cpu    │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68)   │ temp        │           68 │
└────────┴─────────────────────┴───────────────┴───────────────┴─────────────┴──────────────┘

Since CH has lots of useful date and time functions, along with higher order functions and tuples, I think it's almost a natural time-series database.

It would probably be better to modify your schema to have 4 columns:

"some_entity_id", "timestamp", "metric_name", "metric_value"

You can include "metric_name" in the MergeTree index, to improve performance when searching for a specific metric of an entity. Test with and without it, to see if it's useful for the kind of queries you make.

Clickhouse as time-series storage

Tags:

Database

Time Series

Clickhouse

Related

Recent Posts