What is the difference between an elastic search index and an index in a relational database?
There is unfortunate usage of the word "index" which means slightly (edit: VERY) different things in ES and relational databases as they are optimized for different use cases.
An "index" in database is a secondary data structure which makes WHERE
queries and JOIN
s fast, and they typically store values exactly as they appear in the table. You can still have columns which aren't indexed, but then WHERE
s require a full table scan which is slow on large tables.
An "index" in ES is actually a schematic collection of documents, similar to a database in the relational world. You can have different "types" of documents in ES, quite similar to tables in dbs. ES gives you the flexibility of defining for each document's field whether you want to be able to retrieve it, search by it or both. Some details on these options can be found from for example here, also related to _source
field (the original JSON which was submitted to ES).
ES uses an inverted index to efficiently find matching documents, but most importantly it typically "normalizes" strings into tokens so that accurate free-text search can be performed. For example sentences might be splitted into individual words, words are normalized to lower case etc. so that searching for "holland" would match the text "Vacation at Holland 2015".
If a field does not have an inverted index, you cannot perform any searching on it (unlike dbs' full table scan). Interestingly you can also define fields so that you can use them for searching but you cannot retrieve them back, it is mainly beneficial when minimizing in disk and RAM usage is important.
Elastic search is by design a search engine not likely preferred for primary storage like SQL server or Mongo DB etc.
Why entire collection is indexed?
Elastic search internally uses a structure called inverted index which stores each fields(column) value for searching. If the field contains string it will tokenize it, and perform filtering like lower case or upper case etc.
Any way you can find only the data that are available in inverted index. So by default elastic search perform indexing for all fields to make it available/searchable to you.
https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html
This is not the like adding index for Relational DB. In Relational DB you have all the data available then what you need is to index most used columns for quicker find. But its vary less efficient to finding all the rows containing a part of a given word(searching a word)