ElasticSearch -- boosting relevance based on field value
With a recent version of Elasticsearch (version 1.3+) you'll want to use "function score queries":
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
A scored query_string search looks like this:
{
'query': {
'function_score': {
'query': { 'query_string': { 'query': 'my search terms' } },
'functions': [{ 'field_value_factor': { 'field': 'my_boost' } }]
}
}
}
"my_boost" is a numeric field in your search index that contains the boost factor for individual documents. May look like this:
{ "my_boost": { "type": "float", "index": "not_analyzed" } }
You can either boost at index time or query time. I usually prefer query time boosting even though it makes queries a little bit slower, otherwise I'd need to reindex every time I want to change my boosting factors, which usally need fine-tuning and need to be pretty flexible.
There are different ways to apply query time boosting using the elasticsearch query DSL:
- Boosting Query
- Custom Filters Score Query
- Custom Boost Factor Query
- Custom Score Query
The first three queries are useful if you want to give a specific boost to the documents which match specific queries or filters. For example, if you want to boost only the documents published during the last month. You could use this approach with your boosting_field but you'd need to manually define some boosting_field intervals and give them a different boost, which isn't that great.
The best solution would be to use a Custom Score Query, which allows you to make a query and customize its score using a script. It's quite powerful, with the script you can directly modify the score itself. First of all I'd scale the boosting_field values to a value from 0 to 1 for example, so that your final score doesn't become a big number. In order to do that you need to predict what are more or less the minimum and the maximum values that the field can contain. Let's say minimum 0 and maximum 100000 for instance. If you scale the boosting_field value to a number between 0 and 1, then you can add the result to the actual score like this:
{
"query" : {
"custom_score" : {
"query" : {
"match_all" : {}
},
"script" : "_score + (1 * doc.boosting_field.doubleValue / 100000)"
}
}
}
You can also consider to use the boosting_field as a boost factor (_score *
rather than _score +
), but then you'd need to scale it to an interval with minimum value 1 (just add a +1).
You can even tune the result in order the change its importance adding a weight to the value that you use to influence the score. You are going to need this even more if you need to combine multiple boosting factors together in order to give them a different weight.
if you want to avoid to do the boosting each time inside the query, you might consider to add it to your mapping directly adding "boost: factor.
So your mapping then may look like this:
{
"_all" : {"enabled" : "true"},
"properties" : {
"_id": {"type" : "string", "store" : "yes", "index" : "not_analyzed"},
"first_name": {"type" : "string", "store" : "yes", "index" : "yes"},
"last_name": {"type" : "string", "store" : "yes", "index" : "yes"},
"boosting_field": {"type" : "integer", "store" : "yes", "index" : "yes", "boost" : 10.0,}
}
}