How to Get All Results from Elasticsearch in Python
It is also possible to use the elasticsearch_dsl
(link) library:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd
client = Elasticsearch()
s = Search(using=client, index="my_index")
df = pd.DataFrame([hit.to_dict() for hit in s.scan()])
The secret here is s.scan()
which handles pagination and queries the entire index.
Note that the example above will return the entire index since it was not passed any query. To create a query with elasticsearch_dsl
check this link.
You need to pass a size
parameter to your es.search()
call.
Please read the API Docs
size – Number of hits to return (default: 10)
An example:
es.search(index=logs_index, body=my_query, size=1000)
Please note that this is not an optimal way to get all index documents or a query that returns a lot of documents. For that you should do a scroll
operation which is also documented in the API Docs provided under the scan() abstraction for scroll
Elastic Operation.
You can also read about it in elasticsearch documentation