When we use the ES bulk insert data, will generally write code like this:
from elasticsearch import Elasticsearch,helpers es =Elasticsearch(hosts=[{'host':'localhost','port':9200}]) def gendata(): mywords =['foo','bar','baz'] for word in mywords: yield {"_index":"mywords","_type":"document","_type":"document","doc":{"word": word}} helpersbulk(es,gendata())
But when ES load is too large, such an approach may throw exceptions connection timeout.
To solve this problem, when initializing ES connection object, you can set a larger timeout:
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}], timeout=60)
But sometimes, real-time set to 60 seconds or may encounter timeout exception, but the timeout is not always better, so it is best to let ES automatically retry the case encountered a timeout.
When you create a connection object ES, plus two parameters can also achieve timeout automatically retry three times:
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}], timeout=60, max_retries=3, retry_on_timeout=True)
By adding max_retries
and retry_on_timeout
two parameters, time-out can be achieved automatically retries.
If you look directly at ES documentation, you may not find these two parameters, as shown below.
This is not the ES document in question, but because these two parameters hidden in the inside, as shown below.**kwargs
Click-through Transport
you can see these two parameters: