Filter elasticsearch results to contain only unique documents based on one field value
You need a top_hits
aggregation.
And for your specific case:
{
"query": {
"multi_match": {
...
}
},
"aggs": {
"top-uids": {
"terms": {
"field": "uid"
},
"aggs": {
"top_uids_hits": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}
The query above does perform your multi_match
query and aggregates the results based on uid
. For each uid bucket it returns only one result, but after all the documents in the bucket were sorted based on _score
in descendant order.
In ElasticSearch 5.3 they added support for field collapsing. You should be able to do something like:
GET /_search
{
"query": {
"multi_match" : {
"query": "this is a test",
"fields": [ "subject", "message", "uid" ]
}
},
"collapse" : {
"field" : "uid"
},
"size": 20,
"from": 100
}
The benefit of using field collapsing instead of a top hits aggregation is that you can use pagination with field collapsing.