Elasticsearch gives different scores for same documents

The lucene score depends on different factors. Using the tf idf similarity (default one) it mainly depends on:

  1. Term frequency: how much the terms found are frequent within the document
  2. Inverted document frequency: how much the terms found appear among the documents (while index)
  3. Field norms (including index time boosting). Shorter fields get higher score than longer ones.

In your case you have to take into account that your two documents come from different shards, thus the score is computed separately on each of those, since every shard is in fact a separate lucene index.

You might want to have a look at the more expensive DFS, Query then Fetch search type that elasticsearch provides for more accurate scoring. The default one is the simple Query then Fetch.