In bulk insert data ES automatic retry timeout

When we use the ES bulk insert data, will generally write code like this:

from elasticsearch import Elasticsearch,helpers


es =Elasticsearch(hosts=[{'host':'localhost','port':9200}])

def gendata():
    mywords =['foo','bar','baz']
    
    for word in mywords:
        yield {"_index":"mywords","_type":"document","_type":"document","doc":{"word": word}}

helpersbulk(es,gendata())

 

But when ES load is too large, such an approach may throw exceptions connection timeout.

To solve this problem, when initializing ES connection object, you can set a larger timeout:

  es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}], timeout=60)

But sometimes, real-time set to 60 seconds or may encounter timeout exception, but the timeout is not always better, so it is best to let ES automatically retry the case encountered a timeout.

When you create a connection object ES, plus two parameters can also achieve timeout automatically retry three times:

es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}], timeout=60, max_retries=3, retry_on_timeout=True)

By adding max_retriesand retry_on_timeouttwo parameters, time-out can be achieved automatically retries.

If you look directly at ES documentation, you may not find these two parameters, as shown below.

 

This is not the ES document in question, but because these two parameters hidden in the inside, as shown below.**kwargs

 

 

Click-through Transportyou can see these two parameters:

 

 

 Transfer: https://mp.weixin.qq.com/s?src=11×tamp=1579108394&ver=2098&signature=ZXtHL4GJONIJr9lN3KD*vHKfeujxkmmrWRnFl3Pfyu14Qc4lDyAdHN*UtHf6en*KeUFy7edlKqVVw5uxvGXpiaFdGNSX0LUYkAox81WQzZdgs7jLFcHd1-nfsgI3jPIq&new=1

Guess you like

Origin www.cnblogs.com/tjp40922/p/12203610.html