Elastic China Developers Conference 2023 latest dry goods - Elasticsearch 7, 8 new features all in one go

f93902f60f90a04a6a6196cf9652f541.png

With the rapid development of Elasticsearch on a global scale, its functions and application scenarios are becoming more and more abundant. At ElasticConference 2023 today, we learned about a series of exciting new features for the Elasticsearch 7 and 8 series. This article will introduce these new features and their applications in detail to help you better understand and use Elasticsearch.

413647414f5cad352b44635cd6a60e0f.png

1. New cluster balancing strategy

5975c7c0f1d4c778007e9408fb9a710e.png

Strategy 1: Rebalance the disk according to the size of the fragment. In this strategy, the system monitors the disk usage on each node in the cluster. If the disk usage of a node is found to exceed the preset threshold, the system will automatically trigger a shard migration operation to migrate some shards on the node to other nodes with lower usage. This shard size-based rebalancing strategy helps achieve balanced allocation of disk resources in the cluster, thereby improving overall performance.

Strategy 2: Rebalance the index according to the imported data load For the load of read and write requests, the system will monitor the imported data load on each node in the cluster. According to the incoming data load, the system will automatically adjust the distribution of index shards on each node, so that the number of shards on nodes with higher load decreases, while the number of shards on nodes with lower load increases. In this way, the balanced distribution of the data load introduced in the cluster can be realized while ensuring the system performance.

This new cluster balancing strategy has the following advantages: the new cluster balancing strategy can make full use of the resources of each node and improve the overall performance by balancing disk distribution and introducing data load. Realize load balancing between nodes, reduce the impact of a single node failure on the cluster, and enhance system stability.

In addition, this strategy can automatically adjust resource allocation according to actual needs, avoid resource waste, and improve resource utilization. At the same time, automatic adjustment reduces the burden on O&M personnel, reduces the risk of manual intervention, and helps reduce O&M costs.

2. Kibana supports ARM architecture

f9f9cc2c7997af8e7238fe929de4a84c.png

3. Centralized collection platform and security scene functions

af1d3b4ac08957937c0456922d4e0464.png

Elastic Stack introduces a centralized collection platform and provides a series of integration solutions and a unified management platform.

In addition, in terms of security scenarios, the Elastic Stack provides the EQL time series function, which is suitable for scenarios that require sequence matching.

f6568ca061454a7f5f8ff328d26cbfe4.png

4. Storage-computing separation architecture and new search language ESQL

The future development direction of the Elastic Stack mainly revolves around the architecture of separation of service and storage and calculation. In the cloud-native architecture, using object storage as the medium can reduce the cost of data handling and improve the automatic scaling capability.

ea4541ccd3a12c371b31d27763e279d9.png

In addition, Elastic Stack will also introduce a new search language ESQL to provide higher data processing flexibility and performance. ESQL uses pipelines to connect, and can realize search operations in multiple steps such as data conversion and filtering.

4544b2aaf3efb933a44fbaf695662eb1.png

62cc3c2cd5d0a2b813e073c2da2f857e.png

5. Full observation solution

Elastic Stack provides full observation solutions, including logs, indicators, APM, RUM real user monitoring, Synthetic monitoring, general performance analysis, etc. These functions can help users understand and monitor the running status of the system more comprehensively.

edcf24ebb0aa92a6e3a34620d2d1164a.png

1f7f89d41f7f5c836055164350b435cb.png

e2cff84c6e536cae7e501eee367aee42.png

6. Security solutions

Elastic Stack also provides security solutions, including collecting security-related data, analyzing and detecting abnormal behavior, and automatic response. Elastic Security can provide a one-stop security solution, integrating SIEM, Endpoint Security and Threat Hunting functions on one platform to help enterprises achieve more efficient security protection.

5ee612dd8c8ba21bf958bd35c384aa0a.png

7. Machine Learning Integration

Elasticsearch has integrated machine learning functions, which can be used for tasks such as anomaly detection and time series prediction. The new version of Elasticsearch will further optimize machine learning functions, improve model training and prediction performance, and provide more machine learning algorithms for users to choose from.

For this, I personally highly recommend the GPT4 VS Elasticsearch taught by Mr. Li Jie in the second part. It is very good and worth learning repeatedly! (As shown below)

5f705373c98ff0177230bf0337208f5b.png

Faster than faster, Elasticsearch 8.0 is officially released!

8. Geospatial Search and Visualization

Elasticsearch 7 and 8 series further enhance geospatial search and visualization capabilities. New features include support for GeoJSON data, optimizations for processing geospatial data, and more geospatial aggregation and visualization tools. These functions will help users to process and analyze geospatial data more conveniently.

2a90be73159080f9da981b8ec35edb8f.png

Visualization of IP address distribution map based on Elasticsearch + kibana

9. Flexible computing resource scheduling and cost optimization

Elasticsearch introduces the elastic computing resource scheduling function, which can dynamically allocate computing resources according to actual business needs. In addition, the new version also provides cost optimization tools to help users evaluate and optimize the operating costs of Elasticsearch clusters.

10. More powerful API and client library support

Elasticsearch 7 and 8 series will provide more powerful API and client library support to meet the needs of various programming languages ​​and platforms. This will make it easier for developers to integrate and use Elasticsearch functionality.

11. Optimization at the retrieval level

Regarding optimization at the retrieval level, Elasticsearch 7 and 8 series also have many significant improvements. Here are some key search optimization features:

655339e13fa68a3ae210c01fd48d979a.png

In-depth explanation of Elasticsearch retrieval classification - basic articles

11.1. Point In Time (PIT)

Point In Time (PIT) is a new feature introduced after Elasticsearch 7.10 release. It allows users to create a snapshot while searching that remains consistent over time. This enables users to get a consistent view across different search requests, avoiding inconsistent results due to index updates.

7dddecaf16e87518b2d7e5acf783b919.png

Dry goods | Comprehensive and in-depth interpretation of Elasticsearch pagination query

11.2. Wildcard Field Types

The Wildcard field type is a new field type designed to support efficient wildcard and regular expression queries. It can help users execute complex queries containing wildcards and regular expressions faster and improve query performance.

096e7f6ac2edd109ae89096bad82344a.png

Dry goods | Elasticsearch search type selection guide

11.3. Runtime Fields

Runtime Fields is a new field type that allows users to dynamically calculate field values ​​at query time. This means that users do not need to calculate and store these fields when indexing, thus saving storage space and improving indexing performance. In addition, Runtime Fields also supports the Painless scripting language, enabling users to flexibly define field calculation logic.

b4c425892ac5c37de528f0c02c3199f4.png

In-depth explanation of Elasticsearch runtime type Runtime fields

11.4. Retrieving snapshots

Elasticsearch 7 and 8 series support the retrieval snapshot function, allowing users to specify a historical index snapshot when querying. This is very useful for application scenarios that need to query historical data or analyze data changes. Users can easily go back to the data status at any point in time to meet various business needs.

c682aa6abf8d867cf439aee5584eb7d0.png

96a8dac887fdf41830f85b148b8b14d6.png

Dry goods | Elasticsearch searchable snapshot in-depth explanation

11.5. Enrich Pipeline

Enrich Pipeline is a new data processing pipeline that allows users to find and enrich data in real time while indexing. This is similar to the lookup operation in the database, which can help users combine related data into one document for subsequent search and analysis. Enrich Pipeline supports multiple search strategies, such as exact matching, fuzzy matching and geospatial matching, to meet the needs of different scenarios.

fa0cb52304204a7f84b405f74f6b33c9.png

Enrich Processor - a new way for Elasticsearch to link data across indexes

11.6 Search Optimization Sorting

The Block Max WAND algorithm is an efficient document retrieval algorithm based on an inverted index, designed to quickly identify and skip documents that are not competitive, thereby improving query efficiency.

919a8cc81a37f320c0d8513a79448694.png

The implementation process of the Block Max WAND algorithm includes dividing the document collection into multiple blocks, building an inverted index for each block, and using the inverted index to calculate the document score. When selecting the highest-ranked chunks for the next round of retrieval, those chunks with a score lower than the lowest score of the documents already found are skipped. This process is repeated until a sufficient number of documents are found or all blocks are skipped.

11.7 Match only Text

The "Match only Text" query is suitable for scenarios that require fuzzy matching queries on text-type fields, for example, in applications such as search engines and e-commerce platforms, users enter keywords to query, or unstructured or semi-structured data. Word matching, such as log data, social media data, etc. However, it should be noted that this query is usually not suitable for scenarios that require exact matches or range queries. In this case, other query types should be selected, such as "term" query or "range" query.

7487ee8921cb572635a4b6bd39859829.png

Through the optimization of the above retrieval level, Elasticsearch 7 and 8 series have achieved significant improvements in query performance, data storage, real-time computing, and data processing, providing users with more powerful and flexible retrieval functions.

11.8 Save only the Doc Value field

Elasticsearch can choose to save only Doc Values ​​when processing field data. Doc Values ​​is an on-disk columnar storage format that allows Elasticsearch to perform queries and aggregations more efficiently. The benefits of saving only Doc Value fields include: Saving disk space: Keeping only Doc Values ​​can reduce the disk space required to store the index, because it contains only the data actually needed for query and aggregation. Improve query performance: Since Doc Values ​​is columnar storage, Elasticsearch can process data more efficiently when performing operations such as aggregation and sorting.

ab0996bbe5649baec018252e97a93760.png

In-depth interpretation of Elasticsearch internal data structure

This helps to speed up query response times. Reduced memory usage: Doc Values ​​are stored on disk, not in memory, so memory usage can be reduced, especially when performing heavy aggregation operations. Cache-friendly: Since Doc Values ​​are stored in columns, CPU cache lines can be better utilized when caching. This helps improve query performance.

It should be noted that saving only the Doc Value field limits some functionality. For example, the document source (_source) field will not be available, meaning that the original document content cannot be updated or retrieved with a partial document. Therefore, these limitations should be weighed against the above benefits when only Doc Values ​​are retained.

12. Summary

ElasticConference 2023 brings us many exciting new features for the Elasticsearch 7 and 8 series. These new capabilities will help increase data processing capabilities, reduce storage costs, enhance real-time computing flexibility, and improve security and observability. As a mature search and analysis engine, Elasticsearch is constantly being optimized and improved to bring users a better experience.

Note: The content of this article is based on the sharing of Mr. Zhu Jie , the official senior architect of Elastic .

China's largest ElasticStack unofficial public account

  1. Elastic China Developer Conference 2019 dry goods sharing

  2. Dry goods | 2018 Elastic China Developers Conference Notes

  3. Elasticsearch, you deserve it! ——Panoramic review of Elasticsearch scenario-based application at Yunqi Conference

Guess you like

Origin blog.csdn.net/wojiushiwo987/article/details/130037089