Example of using elasticdump to import and export es data (continuously updated)

Elasticdump is a command line tool that can be used to export data from Elasticsearch to JSON files and import JSON files into Elasticsearch. The following is a simple example demonstrating how to use Elasticdump to import and export data:

1. Install Elasticdump

You can install Elasticdump from the command line using the npm command. (Please install npm yourself.) For example, use the following command to install the latest version:

npm install elasticdump -g

Enter the bin directory

cd /opt/module/node16/lib/node_modules/elasticdump/bin

You can see that there are two commands, elasticdump is used to back up a single index, and multielasticdump can be used to back up multiple indexes in parallel.

-rwxr-xr-x. 1 1001 1001  4026 49 14:38 elasticdump
-rwxr-xr-x. 1 1001 1001 14598 1026 1985 multielasticdump

2. Export data

To export data, use the following command:

elasticdump \
  --input=http://192.168.2.227:9200/es_table_index \
  --output=/opt/module/data/data_es_table/es_table_index_mapping.json \
  --type=mapping

elasticdump \
  --input=http://192.168.2.227:9200/es_table_index \
  --output=/opt/module/data/data_es_table/es_table_index_data.json \
  --type=data

2.1 can also be directly imported into another es cluster.

elasticdump \
--input=http://127.0.0.1:9200/test_event   \
--output=http://127.0.0.2:9200/test_event \
--type=mapping

In the same way, backup data can be directly imported into another es cluster.

elasticdump \
--input=http://127.0.0.1:9200/test_event   \
--output=http://127.0.0.2:9200/test_event \
--type=data

3. Import data

To import data, use the following command

elasticdump \
  --input=/opt/module/data/data_es_table/es_table_index_mapping.json \
  --output=http://localhost:9200/my_index \
  --type=mapping

elasticdump \
  --input=/opt/module/data/data_es_table/es_table_index_data.json \
  --output=http://localhost:9200/my_index \
  --type=data

4. Use elasticdump to perform multiple index backup operations:

#将ES索引及其所有类型备份到es_backup文件夹中
multielasticdump direction = dump match ='^.*$'  input = http://127.0.0.1:9200   output =/tmp/es_backup
#仅备份ES索引以“ -index”(匹配正则表达式)为前缀的结尾。仅备份索引数据。所有其他类型都将被忽略。#注意:默认情况下会忽略分析器和别名类型
multielasticdump --direction=dump --match='^.*-index$' --input=http://127.0.0.1:9200 --ignoreType='mapping,settings,template'  --output=/tmp/es_backup
multielasticdump --direction=load --input=/tmp/es_backup --output=http://127.0.0.1:9200

One difference between using multielasticdump is the –direction parameter setting and –ignoreType parameter setting.
When backing up, --direction=dump is the default value, then --input must be the URL of the base location of the ElasticSearch server (i.e. http://localhost:9200), and --output must be a directory. A data, mapping and analyzer file is created for each matching index.

When restoring, to load a file dumped from multielasticsearch, --direction should be set to load, --input must be the directory where multielasticsearch dumped it, and --output must be the Elasticsearch server URL.

–match` is used to filter which indexes should be dumped/loaded (regular expression).

--ignoreType allows types to be ignored from dump/load. Six options are supported. data, mapping, analyzer, alias, settings, template. Multi-type support is provided, each type must be separated by commas when used, and interval allows control of the dump/load interval for generating new indexes.

--includeType allows types to be included in dump/load. Supports six options - data, mapping, analyzer, alias, settings, template.

5. Export the word segmenter. Pay special attention when exporting the word segmenter. We can only import a single one according to the index, and cannot export all. Exporting all will cause an error that the index does not exist:

elasticdump --input=http://ip:9200 --output=http://127.0.0.1:9200/ --type=analyzer --all=true

6. Description of related parameters

# --input 指定来源
# --output 导出的路径(如果以 csv 结尾,则会自动保存为 csv 格式)
# --quiet 不输出任何日志信息
# --scrollTime 用来指定每次滚动查询的时间间隔(当我们需要从 elasticsearch 中检索大量数据时,通常需要使用滚动查询来避免一次性检索过多数据导致内存溢出或性能下降的问题)
# --limit 指定每次滚动查询的文档数量
# --noRefresh 禁用索引的自动刷新功能(当需要执行大量索引操作时,如果每次操作都会自动刷新索引,将导致性能下降)
# --maxRows 指定每次滚动查询的最大文档数量
# --concurrency 并发执行的数量
# --transform 对结果进行转换(如提取特定字段、转换数据格式等)
# --searchBody 查询语句
elasticdump --input="$endpoint/$index" \
  --output=$outputpath \
  --quiet --scrollTime=30m --limit=10000 --noRefresh --maxRows=1000000 \
  --concurrency=2 \
  --transform @transform.js \
  --searchBody "$searchbody"

Guess you like

Origin blog.csdn.net/Liu__sir__/article/details/129793951