ElasticSearch Python Client ReadTimeout

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/jacke121/article/details/86062773

ElasticSearch Python Client ReadTimeout

ElasticSearch Python Client API,Bulk操作时,当ElasticSearch服务端的性能不足时,Client可能会超时,打印类似异常:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\git_project\CV_client\py_client\algorithm\save_thread.py", line 263, in run
    self.kafak.send(imageInfoJson)
  File "E:\git_project\CV_client\py_client\algorithm\kafka_tool.py", line 38, in send
    res = self.es.index(index=index, doc_type=doc_type, body=data, id=id, request_timeout=3)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\__init__.py", line 319, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 180, in perform_request
    raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='192.168.55.66', port=9200): Read timed out. (read timeout=3))

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

2017-09-27 12:37:42.228 25934/MainThread W base.py:96 POST http://localhost:9200/_bulk [status:N/A request:10.011s]

Traceback (most recent call last):

File "build/bdist.linux-x86_64/egg/elasticsearch/connection/http_urllib3.py", line 114, in perform_request

response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 649, in urlopen

_stacktrace=sys.exc_info()[2])

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/util/retry.py", line 333, in increment

raise six.reraise(type(error), error, _stacktrace)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 600, in urlopen

chunked=chunked)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 388, in _make_request

self._raise_timeout(err=e, url=url, timeout_value=read_timeout)

File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 308, in _raise_timeout

raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)

ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10)

简单的解决方法是加入timeout和重试相关参数(参考:https://stackoverflow.com/questions/25908484/how-to-fix-read-timed-out-in-elasticsearch)

  1. Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python

     

    1

     

    es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)

  2. Set the timeout per request made by the client. Taken from Elasticsearch Python docs below.

     

    1

    2

     

    # only wait for 1 second, regardless of the client's default

    es.cluster.health(wait_for_status='yellow', request_timeout=1)

我设置timeout=100,max_retries=3,因为,当ElasticSearch在做大量查询的时候,会消耗掉所有的读IO,此时bluk数据,可能POST成功,但等待服务端返回确认结果timeout了,如果timeout时间设置太短,而max_retries设置太多,会导致数据重复插入max_retries次。

API参数介绍

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

 

>>> help(elasticsearch.Elasticsearch)

Help on class Elasticsearch in module elasticsearch.client:

class Elasticsearch(__builtin__.object)

| Elasticsearch low-level client. Provides a straightforward mapping from

| Python to ES REST endpoints.

| __init__(self, hosts=None, transport_class=<class 'elasticsearch.transport.Transport'>, **kwargs)

| :arg hosts: list of nodes we should connect to. Node should be a

| dictionary ({"host": "localhost", "port": 9200}), the entire dictionary

| will be passed to the :class:`~elasticsearch.Connection` class as

| kwargs, or a string in the format of ``host[:port]`` which will be

| translated to a dictionary automatically. If no value is given the

| :class:`~elasticsearch.Urllib3HttpConnection` class defaults will be used.

|

| :arg transport_class: :class:`~elasticsearch.Transport` subclass to use.

|

| :arg kwargs: any additional arguments will be passed on to the

| :class:`~elasticsearch.Transport` class and, subsequently, to the

| :class:`~elasticsearch.Connection` instances.

>>> help(elasticsearch.Transport)

Help on class Transport in module elasticsearch.transport:

class Transport(__builtin__.object)

| Encapsulation of transport-related to logic. Handles instantiation of the

| individual connections as well as creating a connection pool to hold them.

|

| Main interface is the `perform_request` method.

|

| Methods defined here:

|

| __init__(self, hosts, connection_class=<class 'elasticsearch.connection.http_urllib3.Urllib3HttpConnection'>, connection_pool_class=<class 'elasticsearch.connection_pool.ConnectionPool'>, host_info_callback=<function get_host_info>, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=0.1, sniff_on_connection_fail=False, serializer=<elasticsearch.serializer.JSONSerializer object>, serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(502, 503, 504), retry_on_timeout=False, send_get_body_as='GET', **kwargs)

| :arg max_retries: maximum number of retries before an exception is propagated

| :arg retry_on_status: set of HTTP status codes on which we should retry

| on a different node. defaults to ``(502, 503, 504)``

| :arg retry_on_timeout: should timeout trigger a retry on different

| node? (default `False`)

|

| Any extra keyword arguments will be passed to the `connection_class`

| when creating and instance unless overriden by that connection's

| options provided as part of the hosts parameter.

这里显示,默认max_retries为3,retry_on_timeout为False,retry_on_status为(502, 503, 504)。

 

1

2

3

4

5

6

7

8

9

10

11

 

>>> help(elasticsearch.connection.http_urllib3.Urllib3HttpConnection)

Help on class Urllib3HttpConnection in module elasticsearch.connection.http_urllib3:

class Urllib3HttpConnection(elasticsearch.connection.base.Connection)

| Default connection class using the `urllib3` library and the http protocol.

|

| :arg host: hostname of the node (default: localhost)

| :arg port: port to use (integer, default: 9200)

| :arg url_prefix: optional url prefix for elasticsearch

| :arg timeout: default timeout in seconds (float, default: 10)

可以看出,这里原来默认timeout只有10秒。

猜你喜欢

转载自blog.csdn.net/jacke121/article/details/86062773