ElasticSearch Python Client ReadTimeout

ElasticSearch Python Client API，Bulk操作时，当ElasticSearch服务端的性能不足时，Client可能会超时，打印类似异常：

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\git_project\CV_client\py_client\algorithm\save_thread.py", line 263, in run
self.kafak.send(imageInfoJson)
File "E:\git_project\CV_client\py_client\algorithm\kafka_tool.py", line 38, in send
res = self.es.index(index=index, doc_type=doc_type, body=data, id=id, request_timeout=3)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\__init__.py", line 319, in index
_make_path(index, doc_type, id), params=params, body=body)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\transport.py", line 318, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 180, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='192.168.55.66', port=9200): Read timed out. (read timeout=3))

2017-09-27 12:37:42.228 25934/MainThread W base.py:96 POST http://localhost:9200/_bulk [status:N/A request:10.011s]

Traceback (most recent call last):

File "build/bdist.linux-x86_64/egg/elasticsearch/connection/http_urllib3.py", line 114, in perform_request

response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 649, in urlopen

_stacktrace=sys.exc_info()[2])

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/util/retry.py", line 333, in increment

raise six.reraise(type(error), error, _stacktrace)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 600, in urlopen

chunked=chunked)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 388, in _make_request

self._raise_timeout(err=e, url=url, timeout_value=read_timeout)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 308, in _raise_timeout

raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)

ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10)

简单的解决方法是加入timeout和重试相关参数（参考：https://stackoverflow.com/questions/25908484/how-to-fix-read-timed-out-in-elasticsearch）

Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python
1
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
Set the timeout per request made by the client. Taken from Elasticsearch Python docs below.
1

2
# only wait for 1 second, regardless of the client's default

es.cluster.health(wait_for_status='yellow', request_timeout=1)

我设置timeout=100，max_retries=3，因为，当ElasticSearch在做大量查询的时候，会消耗掉所有的读IO，此时bluk数据，可能POST成功，但等待服务端返回确认结果timeout了，如果timeout时间设置太短，而max_retries设置太多，会导致数据重复插入max_retries次。

API参数介绍

>>> help(elasticsearch.Elasticsearch)

Help on class Elasticsearch in module elasticsearch.client:

class Elasticsearch(__builtin__.object)

| Elasticsearch low-level client. Provides a straightforward mapping from

| Python to ES REST endpoints.

| __init__(self, hosts=None, transport_class=<class 'elasticsearch.transport.Transport'>, **kwargs)

| :arg hosts: list of nodes we should connect to. Node should be a

| dictionary ({"host": "localhost", "port": 9200}), the entire dictionary

| will be passed to the :class:`~elasticsearch.Connection` class as

| kwargs, or a string in the format of ``host[:port]`` which will be

| translated to a dictionary automatically. If no value is given the

| :class:`~elasticsearch.Urllib3HttpConnection` class defaults will be used.

| :arg transport_class: :class:`~elasticsearch.Transport` subclass to use.

| :arg kwargs: any additional arguments will be passed on to the

| :class:`~elasticsearch.Transport` class and, subsequently, to the

| :class:`~elasticsearch.Connection` instances.

>>> help(elasticsearch.Transport)

Help on class Transport in module elasticsearch.transport:

class Transport(__builtin__.object)

| Encapsulation of transport-related to logic. Handles instantiation of the

| individual connections as well as creating a connection pool to hold them.

| Main interface is the `perform_request` method.

| Methods defined here:

| __init__(self, hosts, connection_class=<class 'elasticsearch.connection.http_urllib3.Urllib3HttpConnection'>, connection_pool_class=<class 'elasticsearch.connection_pool.ConnectionPool'>, host_info_callback=<function get_host_info>, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=0.1, sniff_on_connection_fail=False, serializer=<elasticsearch.serializer.JSONSerializer object>, serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(502, 503, 504), retry_on_timeout=False, send_get_body_as='GET', **kwargs)

| :arg max_retries: maximum number of retries before an exception is propagated

| :arg retry_on_status: set of HTTP status codes on which we should retry

| on a different node. defaults to ``(502, 503, 504)``

| :arg retry_on_timeout: should timeout trigger a retry on different

| node? (default `False`)

| Any extra keyword arguments will be passed to the `connection_class`

| when creating and instance unless overriden by that connection's

| options provided as part of the hosts parameter.

这里显示，默认max_retries为3，retry_on_timeout为False，retry_on_status为(502, 503, 504)。

>>> help(elasticsearch.connection.http_urllib3.Urllib3HttpConnection)

Help on class Urllib3HttpConnection in module elasticsearch.connection.http_urllib3:

class Urllib3HttpConnection(elasticsearch.connection.base.Connection)

| Default connection class using the `urllib3` library and the http protocol.

| :arg host: hostname of the node (default: localhost)

| :arg port: port to use (integer, default: 9200)

| :arg url_prefix: optional url prefix for elasticsearch

| :arg timeout: default timeout in seconds (float, default: 10)

可以看出，这里原来默认timeout只有10秒。

ElasticSearch Python Client ReadTimeout

API参数介绍

猜你喜欢