My es data size of 54.33 million, this time frequently executed queries, write operation, the implementation of python found some anomalies, recorded, to see if there is no way to solve
my script
from elasticsearch import Elasticsearch
from elasticsearch import helpers
body = {
"query": {
"range": {
"date": {
"gte": pre_time,
"lte": end_time
}
}
}
}
results = helpers.scan(
client=es,
query=body,
scroll="5m",
index=sub_index,
doc_type='my_type',
timeout="10m"
)
print('开始遍历索引')
sources = set()
for result in results:
1 elasticsearch.exceptions.NotFoundError: NotFoundError (404, '
search_phase_execution_exception', 'No search context found for id [27563069]') view Elasticsearch SearchContextMissingException during 'scan & scroll'
query with Spring Data Elasticsearch me questions like, that is, the context is lost, when context is missing, that is, times out. That solution is to set the timeout long enough.
This usually happens if your search context is not alive anymore.
Check the official website information Keeping Alive at The Search context , you can see the scroll of time should be sufficient to return a batch of data can be.
So I put my program a bit, the scroll from 5m to adjust the original 10m, the original timeout10m adjusted to 15m. Then again observe the implementation of the program, find Scroll request has only succeeded on 1 shards out of 5
also synchronization solution, but also a windfall.
results = helpers.scan(
client=es,
query=body,
scroll="10m",
index=sub_index,
doc_type='my_type',
timeout="15m"
)
The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. 1m, see Time unitsedit) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results
Process Process-4:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "count_in_total_baidu.py", line 66, in sub_in_total
for result in results:
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 379, in scan
**scroll_kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 1016, in scroll
params=params, body=body)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 186, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: NotFoundError(404, 'search_phase_execution_exception', 'No search context found for id [27563069]')
2 Scroll request has only succeeded on 1 shards out of 5.
Process Process-2:
Traceback (most recent call last):
File "/etc/python/python3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/etc/python/python3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "count_in_total.py", line 67, in sub_in_total
for result in results:
File "/etc/python/python3.6/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 394, in scan
(resp['_shards']['successful'], resp['_shards']['total'])
elasticsearch.helpers.ScanError: Scroll request has only succeeded on 1 shards out of 5.
Found an article elasticsearch.helpers.ScanError: the Scroll ON xx Request has only succeeded. Shards , it's because when the index = '' This error is empty, you can index, why is it empty?
3 rejected execution of org.elasticsearch.transport.TransportService
Process Process-1:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "count_in_total_baidu.py", line 64, in sub_in_total
totalEnService.sub_in_total(result.get('_source'))
File "/usr/local/python3/lib/python3.6/site-packages/en_plugin/service/en_service.py", line 200, in sub_in_total
self.handler(total_record)
File "/usr/local/python3/lib/python3.6/site-packages/en_plugin/service/en_service.py", line 133, in handler
self.opt_es(es_data)
File "/usr/local/python3/lib/python3.6/site-packages/en_plugin/service/en_service.py", line 175, in opt_es
success, msg = helpers.bulk(self.es, self.actions)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 257, in bulk
for ok, item in streaming_bulk(client, actions, *args, **kwargs):
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 192, in streaming_bulk
raise_on_error, *args, **kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/helpers/__init__.py", line 137, in _process_bulk_chunk
raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('1 document(s) failed to index.', [{'index': {'_index': 'invoice_title_v3', '_type': 'invoice_title', '_id': '3ae80d12abcde7d60f72ffb7fbc4696d', 'status': 429, 'error': {'type': 'es_rejected_execution_exception', 'reason': 'rejected execution of org.elasticsearch.transport.TransportService$7@6048fcf0 on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@58695c43[Running, pool size = 4, active threads = 4, queued tasks = 202, completed tasks = 13568507]]'}, 'data':
From finishing 5.5.x ElasticSearch common errors in this article, you can see the problem is the client to write too quickly es, es over index data speeds.
Execution GET _nodes/thread_pool
can see the index with the exception of queue_size queue capacity = 200
is consistent. Version 5.x but did not see size
this value.
I follow ---- thread pool settings sticsearch advanced configuration of (b) execution, size default is five times the number of CPU core, I am a 4-core CPU, so just a little larger transfer
PUT _cluster/settings
{
"transient": {
"threadpool.index.type": "fixed",
"threadpool.index.size": 30,
"threadpool.index.queue_size": 1000,
"threadpool.index.reject_policy": "caller"
}
}
Exception information suggesting that
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "transient setting [threadpool.index.queue_size], not dynamically updateable"
}
],
"type": "illegal_argument_exception",
"reason": "transient setting [threadpool.index.queue_size], not dynamically updateable"
},
"status": 400
}
View Transient Setting [threadpool.search.queue_size], not dynamically Updateable , can know queue_size
the parameters and can not be changed by api, you need to modify the configuration file and restart the machine.
In my elasticsearch.yml
add these configurations, but also the following errors, a little crash
threadpool.index.type: fixed
threadpool.index.size: 40
threadpool.index.queue_size: 1000
threadpool.index.reject_policy: caller
Suppressed: java.lang.IllegalArgumentException: unknown setting [threadpool.index.size] did you mean any of [thread_pool.index.size, thread_pool.get.size, thread_pool.index.queue_size, thread_pool.listener.size, thread_pool.bulk.size]?
A solution is not only to seek official documents the Thread Pool
4 Caused by: org.elasticsearch.client.transport.NoNodeAvailableException
initial idea es Since it is a cluster model, then stopped in time a node, should not affect the use, so I After listening to it in a production environment, resulting in the production of large area paralyzed business environment, really painful lesson. Why es cluster fault tolerance so low? Most people think the priority is haproxy by the agent.
data:
elasticsearch:
cluster-name: xx_product
cluster-nodes: 192.168.1.1:9300,192.168.1.2:9300,192.168.1.3:9300
local: false
repositories:
enabled: true
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [{#transport#-1}{a8wHYOwIRjC2sQYilPldrg}{172.19.123.151}{172.19.123.151:9300}, {#transport#-2}{4stEpD9KQSesdbmn2Hldxw}{172.19.123.150}{172.19.123.150:9300}]
at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:347)
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:245)
at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:59)
at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:363)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:408)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:80)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:54)
at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:62)
at com.bwjf.rss.service.impl.CustomerServiceImpl.add(CustomerServiceImpl.java:98)
at com.bwjf.rss.kfk.KfkConsumer.processCustomerMessage(KfkConsumer.java:50)
at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:180)
at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:112)
at org.springframework.kafka.listener.adapter.HandlerAdapter.invoke(HandlerAdapter.java:48)
at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.invokeHandler(MessagingMessageListenerAdapter.java:174)
... 8 common frames omitted
3.1 index / delete
other words, index / delete operations, the size of the thread pool depends on the number of available processes are related, max value of 1 + the number of available processes, so to say es service actually is the number of required core cpu the higher the number of CPU cores, an me index of speed will be faster.
For index/delete operations. Thread pool type is fixed with a size of # of available processors, queue_size of 200. The maximum size for this pool is 1 + # of available processors.
Although the cpu usage is not high, but there are still a number of core requirements
3.2 search
For count/search/suggest operations. Thread pool type is fixed with a size of int((# of available_processors * 3) / 2) + 1, queue_size of 1000.
3.3 bulk
script success, msg = helpers.bulk(self.es, self.actions)
should be used to this, its default value is index
the same as
the article did not find a way to dynamically modify
settings bulk.size: 40, my cpu core stands to reason that the number 8 is no problem, but in practice they always start must be less than 9, really strange
changes elasticsearch.yml
add configuration thread_pool.bulk.queue_size: 2000
after the change, followed in accordance with the original
Chapter 1.8 elasticsearch horizontal expansion restart node
index status becomes red, after a while child programming a yellow, if the lost data, as I collapsed .
Monitoring from the following point of view, it should be also adjustment.
Here you can see the progress of the migration
execution command GET _cat/thread_pool
, under the order of the digital map is active, queue and rejected
data queue is very easy to 200, so set larger, there should be results.
You can also perform GET _cat/thread_pool/bulk?v&h=id,name,active,rejected,completed
, see the bulk of special circumstances
trace log and found no abnormalities bulk, and problem-solving