Elasticsearch aggregations on nested inner hits
Inner_hits aggregation is not supported by elasticsearch. The reason behind it is that inner_hits is a very expensive operation and applying aggregation on inner_hits is like exponential increase in complexity of operation. Here is the github link of the issue.
If you want aggregation on inner_hits you can probably use the following approach:
- Make flexible query where you only get the required hit from elastic and aggregate over it. Repeat it multiple time to get all the hits and aggregate simultaneously. This approach may lead you with multiple search query which is not advisable.
- You can make your application layer handle the aggregation logic by writing smart aggregation parser and run those parser on response from elasticsearch. This approach is a little better but you have an overhead of maintaining the parser according to changing needs.
I would personally recommend you to change your data-mapping style in elasticsearch so that you are able to run aggregation on it.
Your query is pretty complex. To be short, here is your requested query:
{
"size": 0,
"aggregations": {
"nested_A": {
"nested": {
"path": "records"
},
"aggregations": {
"bool_aggregation_A": {
"filter": {
"bool": {
"must": [
{
"term": {
"records.data.field1": "value1"
}
}
]
}
},
"aggregations": {
"reverse_aggregation": {
"reverse_nested": {},
"aggregations": {
"bool_aggregation_B": {
"filter": {
"bool": {
"must": [
{
"range": {
"first_timestamp": {
"gte": 1504548296273,
"lte": 1504549196273,
"format": "epoch_millis"
}
}
}
]
}
},
"aggregations": {
"nested_B": {
"nested": {
"path": "records"
},
"aggregations": {
"my_histogram": {
"date_histogram": {
"field": "records.timestamp",
"interval": "1s",
"min_doc_count": 1,
"extended_bounds": {
"min": 1504548296273,
"max": 1504549196273
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Now, let me explain every step by aggregations' names:
- size: 0 -> we are not interested in hits, only aggregations
- nested_A ->
data.field1
is under records so we dive our scope to records - bool_aggregation_A -> filter by
data.field1
: value1 - reverse_aggregation ->
first_timestamp
is not in nested document, we need to scope out from records - bool_aggregation_B -> filter by
first_timestamp
range - nested_B -> now, we scope again into records for
timestamp
field (located under records) - my_histogram -> finally, aggregate date histogram by
timestamp
field