Chapter 1.7 elasticsearch online Problem Set 1

My es data size of 54.33 million, this time frequently executed queries, write operation, the implementation of python found some anomalies, recorded, to see if there is no way to solve
my script

from elasticsearch import Elasticsearch
from elasticsearch import helpers
body = {
            "query": {
                "range": {
                    "date": {
                        "gte": pre_time,
                        "lte": end_time
                    }
                }
            }
        }
 results = helpers.scan(
        client=es,
        query=body,
        scroll="5m",
        index=sub_index,
        doc_type='my_type',
        timeout="10m"
    )
    print('开始遍历索引')
    sources = set()
    for result in results:

1 elasticsearch.exceptions.NotFoundError: NotFoundError (404, '
search_phase_execution_exception', 'No search context found for id [27563069]') view Elasticsearch SearchContextMissingException during 'scan & scroll'
query with Spring Data Elasticsearch me questions like, that is, the context is lost, when context is missing, that is, times out. That solution is to set the timeout long enough.

This usually happens if your search context is not alive anymore.

Check the official website information Keeping Alive at The Search context , you can see the scroll of time should be sufficient to return a batch of data can be.
So I put my program a bit, the scroll from 5m to adjust the original 10m, the original timeout10m adjusted to 15m. Then again observe the implementation of the program, find Scroll request has only succeeded on 1 shards out of 5also synchronization solution, but also a windfall.

results = helpers.scan(
        client=es,
        query=body,
        scroll="10m",
        index=sub_index,
        doc_type='my_type',
        timeout="15m"
    )
The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. 1m, see Time unitsedit) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results
Process Process-4:
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "count_in_total_baidu.py", line 66, in sub_in_total
    for result in results:
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 379, in scan
    **scroll_kwargs)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 1016, in scroll
    params=params, body=body)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 186, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: NotFoundError(404, 'search_phase_execution_exception', 'No search context found for id [27563069]')

2 Scroll request has only succeeded on 1 shards out of 5.

Process Process-2:
Traceback (most recent call last):
  File "/etc/python/python3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/etc/python/python3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "count_in_total.py", line 67, in sub_in_total
    for result in results:
  File "/etc/python/python3.6/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 394, in scan
    (resp['_shards']['successful'], resp['_shards']['total'])
elasticsearch.helpers.ScanError: Scroll request has only succeeded on 1 shards out of 5.

Found an article elasticsearch.helpers.ScanError: the Scroll ON xx Request has only succeeded. Shards , it's because when the index = '' This error is empty, you can index, why is it empty?
3 rejected execution of org.elasticsearch.transport.TransportService

Process Process-1:
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "count_in_total_baidu.py", line 64, in sub_in_total
    totalEnService.sub_in_total(result.get('_source'))
  File "/usr/local/python3/lib/python3.6/site-packages/en_plugin/service/en_service.py", line 200, in sub_in_total
    self.handler(total_record)
  File "/usr/local/python3/lib/python3.6/site-packages/en_plugin/service/en_service.py", line 133, in handler
    self.opt_es(es_data)
  File "/usr/local/python3/lib/python3.6/site-packages/en_plugin/service/en_service.py", line 175, in opt_es
    success, msg = helpers.bulk(self.es, self.actions)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 257, in bulk
    for ok, item in streaming_bulk(client, actions, *args, **kwargs):
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 192, in streaming_bulk
    raise_on_error, *args, **kwargs)
  File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 137, in _process_bulk_chunk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('1 document(s) failed to index.', [{'index': {'_index': 'invoice_title_v3', '_type': 'invoice_title', '_id': '3ae80d12abcde7d60f72ffb7fbc4696d', 'status': 429, 'error': {'type': 'es_rejected_execution_exception', 'reason': 'rejected execution of org.elasticsearch.transport.TransportService$7@6048fcf0 on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@58695c43[Running, pool size = 4, active threads = 4, queued tasks = 202, completed tasks = 13568507]]'}, 'data':

From finishing 5.5.x ElasticSearch common errors in this article, you can see the problem is the client to write too quickly es, es over index data speeds.
Execution GET _nodes/thread_poolcan see the index with the exception of queue_size queue capacity = 200is consistent. Version 5.x but did not see sizethis value.
1
I follow ---- thread pool settings sticsearch advanced configuration of (b) execution, size default is five times the number of CPU core, I am a 4-core CPU, so just a little larger transfer

PUT _cluster/settings
{
  "transient": {
    "threadpool.index.type": "fixed",
    "threadpool.index.size": 30,
    "threadpool.index.queue_size": 1000,
    "threadpool.index.reject_policy": "caller"
  }
}

Exception information suggesting that

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "transient setting [threadpool.index.queue_size], not dynamically updateable"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "transient setting [threadpool.index.queue_size], not dynamically updateable"
  },
  "status": 400
}

View Transient Setting [threadpool.search.queue_size], not dynamically Updateable , can know queue_sizethe parameters and can not be changed by api, you need to modify the configuration file and restart the machine.
In my elasticsearch.ymladd these configurations, but also the following errors, a little crash

threadpool.index.type: fixed
threadpool.index.size: 40
threadpool.index.queue_size: 1000
threadpool.index.reject_policy: caller
Suppressed: java.lang.IllegalArgumentException: unknown setting [threadpool.index.size] did you mean any of [thread_pool.index.size, thread_pool.get.size, thread_pool.index.queue_size, thread_pool.listener.size, thread_pool.bulk.size]?

A solution is not only to seek official documents the Thread Pool
4 Caused by: org.elasticsearch.client.transport.NoNodeAvailableException
initial idea es Since it is a cluster model, then stopped in time a node, should not affect the use, so I After listening to it in a production environment, resulting in the production of large area paralyzed business environment, really painful lesson. Why es cluster fault tolerance so low? Most people think the priority is haproxy by the agent.

data:
    elasticsearch:
      cluster-name: xx_product
      cluster-nodes: 192.168.1.1:9300,192.168.1.2:9300,192.168.1.3:9300
      local: false
      repositories:
        enabled: true
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [{#transport#-1}{a8wHYOwIRjC2sQYilPldrg}{172.19.123.151}{172.19.123.151:9300}, {#transport#-2}{4stEpD9KQSesdbmn2Hldxw}{172.19.123.150}{172.19.123.150:9300}]
	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:347)
	at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:245)
	at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:59)
	at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:363)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:408)
	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:80)
	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:54)
	at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:62)
	at com.bwjf.rss.service.impl.CustomerServiceImpl.add(CustomerServiceImpl.java:98)
	at com.bwjf.rss.kfk.KfkConsumer.processCustomerMessage(KfkConsumer.java:50)
	at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:180)
	at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:112)
	at org.springframework.kafka.listener.adapter.HandlerAdapter.invoke(HandlerAdapter.java:48)
	at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.invokeHandler(MessagingMessageListenerAdapter.java:174)
	... 8 common frames omitted

3.1 index / delete
other words, index / delete operations, the size of the thread pool depends on the number of available processes are related, max value of 1 + the number of available processes, so to say es service actually is the number of required core cpu the higher the number of CPU cores, an me index of speed will be faster.

For index/delete operations. Thread pool type is fixed with a size of # of available processors, queue_size of 200. The maximum size for this pool is 1 + # of available processors.

Although the cpu usage is not high, but there are still a number of core requirements
1
3.2 search
1

For count/search/suggest operations. Thread pool type is fixed with a size of int((# of available_processors * 3) / 2) + 1, queue_size of 1000.

3.3 bulk
script success, msg = helpers.bulk(self.es, self.actions)should be used to this, its default value is indexthe same as
the article did not find a way to dynamically modify
settings bulk.size: 40, my cpu core stands to reason that the number 8 is no problem, but in practice they always start must be less than 9, really strange
1
changes elasticsearch.ymladd configuration thread_pool.bulk.queue_size: 2000
after the change, followed in accordance with the original
Chapter 1.8 elasticsearch horizontal expansion
restart node
index status becomes red, after a while child programming a yellow, if the lost data, as I collapsed .
Monitoring from the following point of view, it should be also adjustment.
1
Here you can see the progress of the migration
2
execution command GET _cat/thread_pool, under the order of the digital map is active, queue and rejected
data queue is very easy to 200, so set larger, there should be results.
You can also perform GET _cat/thread_pool/bulk?v&h=id,name,active,rejected,completed, see the bulk of special circumstances
1
trace log and found no abnormalities bulk, and problem-solving
1

Published 317 original articles · won praise 168 · Views 460,000 +

Guess you like

Origin blog.csdn.net/warrah/article/details/89354469