Elasticsearch store field vs _source
Clinton Gormley says in the link below
https://groups.google.com/forum/#!topic/elasticsearch/j8cfbv-j73g/discussion
by default ES stores your JSON doc in the _source field, which is set to "stored"
by default, the fields in your JSON doc are set to NOT be "stored" (ie stored as a separate field)
so when ES returns your doc (search or get) it just load the _source field and returns that, ie a single disk seek
Some people think that by storing individual fields, it will be faster than loading the whole JSON doc from the _source field. What they don't realise is that each stored field requires a disk seek (10ms each seek! ), and that the sum of those seeks far outweighs the cost of just sending the _source field.
In other words, it is almost always a false optimization.
The _source
field stores the JSON you send to Elasticsearch and you can choose to only return certain fields if needed, which is perfect for your use case. I have never heard that the stored fields will be faster for searches. The _source
field could be bigger on disk space, but if you have to store every field there is no need to use stored fields over the _source
field. If you do disable the source field it will mean:
- You won’t be able to do partial updates
- You won’t be able to re-index your data from the JSON in your Elasticsearch cluster, you’ll have to re-index from the data source (which is usually a lot slower).
By default in elasticsearch, the _source
(the document one indexed) is stored. This means when you search, you can get the actual document source back. Moreover, elasticsearch will automatically extract fields/objects
from the _source
and return them if you explicitly ask for it (as well as possibly use it in other components, like highlighting).
You can specify that a specific field is also stored. This means that the data for that field will be stored on its own. Meaning that if you ask for field1
(which is stored), elasticsearch will identify that its stored, and load it from the index instead of getting it from the _source
(assuming _source
is enabled).
When do you want to enable storing specific fields? Most times, you don't. Fetching the _source
is fast and extracting it is fast as well. If you have very large documents, where the cost of storing the _source
, or the cost of parsing the _source
is high, you can explicitly map some fields to be stored instead.
Note, there is a cost of retrieving each stored field. So, for example, if you have a json with 10 fields with reasonable size, and you map all of them as stored, and ask for all of them, this means loading each one (more disk seeks), compared to just loading the _source
(which is one field, possibly compressed).
I got this answer on below link answered by shay.banon you can read this whole thread to get good understanding about it. enter link description here