Elasticsearch saves 60% of storage costs, and shares technical dry goods

background

The draft design product of draft technology is a multi-scenario online design platform focusing on commercial design. It breaks the technical limitations between software and hardware, integrates creative content and design tools, and provides high-quality solutions for design needs in different scenarios. , to meet the design needs of all types of media such as pictures and videos, and make the design simpler.

The draft technology uses Elasticsearch (hereinafter referred to as ES) as the log retrieval component. With the growth of business volume, there are about 2T new data every day, which needs to be saved for 15 to 30 days, which brings a lot to the disk and the system. pressure. In ES, in order to ensure the performance of log writing and querying, high-performance cloud disks with higher unit storage costs are mostly used. However, in actual business scenarios, data that exceeds 7 days is only used for infrequent use, and all storage in high-performance cloud disks will inevitably lead to excessive costs and waste of space.

plan

Elasticsearch version 7.10 introduced the concept of index life cycle and began to support data tiered storage. Different nodes can be specified to use different disk media to distinguish between hot and cold data. For example, using HDD disks to store warm and cold data can obtain more space and lower cost. This feature is very suitable for log indexing scenarios.

On the storage medium of warm and cold data, using JuiceFS instead of HDD disk is equivalent to obtaining unlimited storage space. Through ES's index life cycle management, the entire life cycle management of index creation - migration - destruction can be automatically completed without manual intervention .

In our practice, we first upgrade the ES cluster to the latest 7.13 version. Then split the hot and cold nodes, with the hot nodes prioritizing performance and the cold nodes prioritizing storage capacity and cost. At the same time, adjust the index and template mode, configure the data life cycle, index template and data flow, and complete the index data writing.

The flow of the entire index after adjustment is shown in the following figure:

When the index is created, configure index.routing.allocation.require.box_type:hotthe node filtering; when the index enters the warm cycle, adjust index.routing.allocation.require.box_type:warmand migrate to the warm node, the data enters the cold node storage, and is actually stored in JuiceFS; when the index enters the delete cycle, ES will automatically store the index Data deletion.

Customer benefits

What is JuiceFS used in the scenario?

JuiceFS is an enterprise-level distributed file system designed for cloud environments. Provide complete POSIX compatibility and provide a low-cost, unlimited space shared file system for applications. Using JuiceFS to store data, the data itself will be persisted in object storage (for example, Amazon S3, Alibaba Cloud OSS, etc.), combined with JuiceFS metadata service to provide high-performance file storage. JuiceFS provides fully managed services in global public cloud services, which can be configured in ten minutes with just a few clicks of the mouse. At the same time, JuiceFS will be open sourced on GitHub in early 2021, attracting the attention and participation of developers around the world, and has now gained 3700+ stars.

In this solution, after the ES cluster warm nodes use JuiceFS for storage, we no longer need to do capacity planning and expansion work for these nodes, and also save data migration in the event of node failure, which reduces costs and brings great benefits to operation and maintenance. convenience.

The persistence layer of JuiceFS uses object storage, flexible billing, and lower TCO than using ordinary cloud disks. In the ES cluster of this solution, the cloud disk price used by the Hot node is 1,000 yuan/TB/month, and the fully managed JuiceFS service plus the overhead of object storage, the price is about 250 yuan/TB/month. The total capacity of the ES cluster is 60TB+. Through cold and hot tiered processing, 75% of the data is stored in JuiceFS, and only the storage cost has been saved by nearly 60%. If combined with the time and effort saved by the operation and maintenance team, the TCO reduction brought by this solution for the customer's data storage is at least 70% .

practice

Cluster configuration

The cluster has a total of 9 nodes, an independent master node (elastic_001), and 8 data nodes, including 5 hot data nodes (elastic_002 ~ elastic_006) and 3 cold data nodes (elastic_007 ~ elastic_009).

Directory Mounting and Configuration

JuiceFS is mounted on the ES cold data node and provides the ES data directory.

The node is configured with a 2T data disk, which is mounted in the /data directory. The ES process is started as a container. The data disk is mounted with the system /data/elasticdirectory. Because the container is used to mount the system directory The ES data directory ( ) cannot be pointed to a subdirectory mounted by JuiceFS by means of a soft link. /data/elasticUse the bind mount of the Linux system to mount the subdirectory of JuiceFS to /data/elasticthis path. For example on node 007:

# ./juicefs mount gd-elasticsearch-jfs  \ 
--cache-dir=/data/jfsCache --cache-size=307200 \
--upload-limit=800  /jfs
# mount -o bind /jfs/data-elastic-pro-007 /data/elastic

This is /data/elasticwhat you /jfs/data-elastic-pro-007see in the directory.

Similar mount operations are also performed on nodes 008 and 009.

If you are not familiar with the basic operations such as initialization and mounting of JuiceFS, please refer to the official documentation of JuiceFS.

There are many random write operations when ES indexes Rollover. In order to ensure the write performance, the writeback parameter is added when JuiceFS is mounted, so that the data will be written to the local disk first, and the data will be asynchronously uploaded to the object storage in the background. The local disk directory is used /data/jfsCache/gd-elasticsearch-jfs/rawstaging/, please be careful not to delete any files in this directory, otherwise data loss may occur.

cache-sizeand upload-limitare used to limit the local read cache space to 300GiB, and the write object storage bandwidth to no more than 800Mbps. attrcachetoand represent the cache timeout time of the kernel's attr cache and entry cache, entrycachetorespectively , in seconds.

performance optimization

Reduce node load

Before the adoption of JuiceFS, Force Merge was configured in the ES cluster life cycle. The specific configuration item is warm.actions.forcemerge.max_num_segments: 1that it will cause the data to be re-merge during Rollover, which will bring great pressure to the CPU. This step is completely unnecessary. Turning off the Force Merge configuration can avoid unnecessary performance overhead and reduce node load.

Rollover parameter configuration optimization

Since the data written to JuiceFS in the warm phase will eventually be persisted to the object storage, the application layer does not need to store multiple copies. You can set the replicas to 0 during the index Rollover process, ie warm.actions.number_of_replicas: 0.

In addition, considering that when the index data is migrated to the warm stage, the data will not be written anymore, you can set the index to be read-only in the warm stage, that is warm.actions.readonly: {}, closing the data writing of the index can reduce the memory usage.

Summarize

With the passage of time and the growth of business volume, enterprises are bound to face the dual challenges of larger-scale data storage and management. In this case, Daoding Technology fully utilizes Elasticsearch's life cycle management capabilities and stores log data in layers according to business needs. The hot data that needs to be used frequently is stored in the SSD, and the data that is used infrequently for more than 7 days is stored in the more cost-effective JuiceFS, saving customers 60% of storage costs. At the same time, JuiceFS also provides nearly unlimited elastic space for applications, eliminating a series of operation and maintenance work such as capacity planning, capacity expansion, and data migration, and improving the efficiency of enterprise IT architecture.

Recommended reading :

Shopee x JuiceFS: ClickHouse Cold and Hot Data Separation Storage Architecture and Practice

JuiceFS v0.17 released, passed 1270 LTP tests!

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324098867&siteId=291194637