Elasticsearch 7.3.0 release, based on Lucene 8.1.0 search service

Today we are pleased to announce the release of Elasticsearch 7.3.0 based on Lucene 8.1.0. This is the latest stable version, already deployed by our Elasticsearch Service.

The latest stable version 7.x:

Data Frame - converted data stream and Perspective

Data frame conversion is a central new feature in Elasticsearch, which allows you to convert existing indexes to secondary aggregated index. Data framework enables you to convert data and create perspective entity-centric index, these indices can summarize the behavior of the entity. It organizes the data format to facilitate analysis.

Converting the data frame is available in the original 7.2. In 7.3, they are now either run as a single batch conversion, it can also continue to incorporate new data when new data is received

Media audit log events assume you are streaming, these events from data centers in a number of different hosts, you want to analyze user behavior in order to find anything suspicious. Uses the data frame, you can log events grouped by user, host, and working days. Thus, for each user, you have a press request count for each interactive server type packet. By entities to organize data and summarize many events, it makes it easier to run different numerical models and abnormal behavior.

It allows new possibilities machine frame data analysis (e.g., detection of an abnormal value, it may be a perfect match to secure the example above) study, but their visual or other types of custom type of analysis is also useful for. We would like to know the purpose of the user to use this feature.

Frame conversion data released under a free basic license as beta.

Improve search volume

Elasticsearch for search, here we are in the 7.3 release of some exciting new search feature.

It found that the most frequent value
we have added a new rare_terms polymerization, it uses a resource-efficient algorithm with predictable results. It is a polymerization for long tail identifying keywords, terms such as lower count doc. From a technical perspective, rare terms polymerization work by maintaining a mapping term includes a counter value is associated with each. Counter is incremented each time to identify items. If the counter exceeds a predefined threshold, deleting the term from the map and inserted cuckoo filter. If you find a word in the future cuckoo filter, we assume that it had previously been removed from the map, and is "common." Such a method of polymerization than the other (the term polymerization to size: MAX_LONG) save more memory, or counts in ascending sort term polymerization (where the error is unbounded).

The term rare cases with a plurality of polymerization; e.g., the SIEM user often bought infrequent events, these events are sometimes suspected form of security events.

Built-vector similarity function for a script document scoring
many popular algorithms representative vector records (e.g. word2vec and convolutional neural network), which permit the vector similarity as a measure of similarity of the records. In this version, we added two predefined functions for calculating a given query vector between the vector and the document vector similarity:

  • Cosine similarity
  • Dot product similarity

These vectors are the most commonly used two comparative distance function. We will publish these as painless script function, so users can fully flexibly in conjunction with other fields in relevance ranking in using them. Users can use these features were scored by script_score painless query. We plan to release additional vector similarity function in future versions, such as the Euclidean distance and Manhattan distance, because each of these vector similarity functions have proved the superiority of a particular scene.

Some of our users use Elasticsearch as a data source that machine learning algorithms, and has been requesting such features. We are very pleased to see what the community will now find new uses for Elasticsearch by the introduction of this feature.

This experimental feature released under the free basic license.

Improved query interval
in 7.0, we introduced a range of queries. When the user wants to find a word or phrase a certain distance apart from each other records, use this query. It is easy to define the syntax provides advanced search options, and produce accurate results.

Range query is ideal for legal and patent search and other use cases. Version 7.3 includes two important complement to the query interval:

  • wildcard query at regular intervals to allow the wildcard (* and?) a set of correlation definitions of terms, and to select the analyzer will be used.
  • This rule allows the prefix defined in the interval between the beginning of the term specific letters can specifically choose to index or prefix query expansion is limited to 128 instead of the term.

Now, close spacing query feature parity span query, the user can switch to other laws and patent search query interval.

Effective recording process having a large number of dynamic fields
new object fields flattened plane allows the entire JSON object indexed into a single field. This document contains a number of fields (e.g., HTTP headers or image metadata) is useful in situations. Subfield flattened objects and behavior almost identical key fields, thus allowing only basic query and aggregation (do not support numerical range queries, full-text search or highlight). Prior to 7.3, you must have a large number of records indexed fields into a separate field, which can greatly increase the number of maps, the map is more difficult to manage and increase the size of the cluster state.

Release support for a flat object type in the free basic license.

Updated list of synonyms index, no index down time
to use this feature, synonyms search filter analyzer can be used quickly and flexibly update. For example, e-commerce businesses the flexibility to add new synonyms to verify the new product is associated with the user's query, and does not return an empty result set. The new Reload Search Analyzers API, synonyms filters only need to update files on a node and invoke Reload Search Analyzers API, it can be loaded without having to restart the search analyzer index fragmentation. This allows users to update synonyms for each index, and search index without downtime (index close and re-open).

This experimental feature is released under the free basic license.

Stick to it, we have something cool
7.3 is not just about improved search function, it also includes many other new details.

Just register to vote in the new master node
Elasticsearch 7.0 introduces a new cluster coordination layer, which contains a number of improvements, including faster primary election, delete minimum_master_nodes settings and the use of formal methods for design verification. Elasticsearch coordination layer there is another important advantage of the new cluster 7.0: it can be used as the basis Elasticsearch important improvements, such as to vote only in line with the master node node. Node meet the main criteria is the only voting nodes can participate in primary elections, but will not act as the primary node in the cluster (only vote in elections). Only by voting in an election, you can use smaller machines, and cluster requires less hardware resources. Go Elasticsearch documentation for instructions on setting the master node eligibility to vote only in Elasticsearch 7.3 in.

The main nodes in line with the conditions of limited voting can be used in the free basic license.

Aliases can cross cluster
cross-cluster replication (CCR) as a function of GA Elasticsearch 6.7 of release. CCR has various use cases, including a data center and across the inter-regional replication reproduce data closer to the user and the application server, and the maintenance of the centralized cluster replication reports from a number of smaller clusters. Elasticsearch 7.3 CCR contains additional features to ensure an alias on the leader of the index operation is copied to the followers index. Note: This procedure is written to ignore aliases, because followers index does not receive direct write, writing aliases useless.

SQL query support API Client and JDBC / ODBC driver freeze Index
This feature allows a dedicated SQL query syntax expansion freezing index. Freeze index is not often save search of old data and perform super low cost effective way to do this. Because users often are not in its "normal" query contains freeze the index, so the use of SQL, you need to explicitly request include freezing index. This can be done by using FROZEN reserved words, for example. SELECT * FROM FROZEN myIndex LIMIT 10;

GUI support snapshot restore and delete
Elasticsearch management UI (Kibana> Management> Elasticsearch) continue to develop. In this release, we have enhanced the previously released "snapshot repository" section, now called "Snapshot and Restore", you can restore from an existing snapshot. Snapshot Restore wizard will guide you through the defined restore task. You can track the progress of the reduction is currently running in "reduced state" view. Now you can also delete the snapshot from the UI. For more information about these enhancements, see the snapshots and restore.

This UI feature released under the free basic license.
2019-08-01T02:47:42.png

Use outlier detection data to find the most unusual
target anomaly detection is to find the most unusual data point in the index. We analyze each data point (document index) number field, and use them to annotate their exceptions.

We used unsupervised outlier detection, which means no need to provide training data set to teach outlier detection to identify outliers. In practice, this is achieved by using a set of distance-based recognition techniques and those based on the density of data points in the index data of the most diverse large. We assigned a score outlier data points for each analysis, the score for capturing index difference entity other entities.

In addition to new outlier detection, we also introduced Evaluate API, enabling the user to calculate a series of performance metrics, such as confusion matrix, precision, recall, the receiver operating characteristic (ROC) curves and the areas under the ROC. curve. If you really are outliers have been marked as points which indicate what is normal and abnormal operation detection source index, you can use Evaluate API to evaluate the performance outlier detection analysis of the data set.

Guess you like

Origin www.oschina.net/news/108730/elasticsearch-7-3-0-released