ElasticSearch - Return Unique Values
You can use the terms aggregation.
{
"size": 0,
"aggs" : {
"langs" : {
"terms" : { "field" : "language", "size" : 500 }
}
}}
The size
parameter within the aggregation specifies the maximum number of terms to include in the aggregation result. If you need all results, set this to a value that is larger than the number of unique terms in your data.
A search will return something like:
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"hits" : {
"total" : 1000000,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"langs" : {
"buckets" : [ {
"key" : "10",
"doc_count" : 244812
}, {
"key" : "11",
"doc_count" : 136794
}, {
"key" : "12",
"doc_count" : 32312
} ]
}
}
}
if you want to get the first document for each language
field unique value, you can do this:
{
"query": {
"match_all": {
}
},
"collapse": {
"field": "language.keyword",
"inner_hits": {
"name": "latest",
"size": 1
}
}
}
I am looking for this kind of solution for my self as well. I found reference in terms aggregation.
So, according to that following is the proper solution.
{
"aggs" : {
"langs" : {
"terms" : { "field" : "language",
"size" : 500 }
}
}}
But if you ran into following error:
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [fastest_method] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
]}
In that case, you have to add "KEYWORD" in the request, like following:
{
"aggs" : {
"langs" : {
"terms" : { "field" : "language.keyword",
"size" : 500 }
}
}}