Elasticsearch More Like This Query
The min term frequency and min doc frequency are actually applied on the input before doing the MLT. Which means as you have only one occurrence of apple in your input text , apple was never qualified for MLT as min term frequency is set to 2. If you change your input to "apple apple" as below , things will work -
POST /test_index/_search
{
"query": {
"more_like_this": {
"fields": [
"text"
],
"like_text": "apple apple",
"min_term_freq": 2,
"percent_terms_to_match": 1,
"min_doc_freq": 1
}
}
}
Same goes for min doc frequency too. Apple is found in atleast 2 document , so min_doc_freq upto 2 will qualify apply from input text for MLT operations.
As the poster of this question, I was trying to wrap my mind around the more_like_this query, too...
I struggled a bit to find good sources of information on the web, but (as in most cases) documentation seems to help the most, so, here's the link to the documentation, and some more important terms (and/or a bit more difficult to understand, so I added my interpretation):
max_query_terms
- The maximum number of query terms that will be selected (from each input document). Increasing this value gives greater accuracy at the expense of query execution speed. Defaults to 25.
min_term_freq
- The minimum term frequency below which the terms will be ignored from the input document. Defaults to 2.
If the term appears in the input document less than 2 (default) times, it will be ignored from the input document, i.e. not be searched for in other possible
more_like_this
documents.
min_doc_freq
- The minimum document frequency below which the terms will be ignored from the input document. Defaults to 5.
This one took me a second to get, so, here's my interpretation:
In how many documents a term from the input document must appear in order to be selected as a query term.
There it is, I hope I saved someone a few minutes of his life. :)
Cheers!