Translation of a few core concepts ES ES will learn

The official text

Elasticsearch There are some very basic but very core of the concept, before the official start learning, you need to understand them under this behind you will learn a great help.

Near real-time NRT

Elasticsearch is a near real-time search platform. This means that the document is indexed from the search have to really be a brief delay (usually 1 second).

Cluster Cluster

Set of clusters, namely one or more nodes (servers), which carries all the data we hold, and provides all the nodes in the ability to index and search. Each cluster has a unique name to identify the default name for the "elasticsearch". The name is very important that each node can only be added to a cluster, each node can become part of a cluster, it is determined by this name.

We need to ensure that different environments, cluster names are different, otherwise it will be added to the wrong node cluster. For example, you can use the logging-dev, logging-stage, logging-production named to represent the three clusters develop, advance and three production environment.

Only one node in the cluster can work perfectly. Also, you can set a different cluster name, goals run multiple clusters.

Node Node

Node is a single service, is also part of the cluster, is responsible for storing the data, index, and search capabilities required to participate in each cluster node. And the same cluster, each node has its own name, the default name of the node is randomly generated by the UUID, when the node is started, it is assigned a name. If you do not want to use the name generated by default, you can also customize. The name is also very important, the cluster through them to identify each node during cluster management, to know which node belongs to a cluster node names also need to be identified.

By configuring the cluster name for a node, you can add it to the specified cluster. The default node name will be added to as "elaticsearch" clusters. If you start multiple nodes in a network, and they can find each other, then these nodes will spontaneously form a name for the "elaticsearch" clusters.

A cluster may be composed of any of a plurality of nodes. If the network node is not running, then start a new node will automatically create a new cluster, and this cluster contains only one node, the default name is "elasticsearch".

Index Index

Index, is a collection of documents that have similar properties. How to understand it? For example, you can type a consumer of data storage with an index, a commodity index stored information, as well as an index save the order data. Each index is by name (must be lowercase) to recognize. When we perform some operation on an index of document data, such as indexing, search, update, and delete time, also identified by name.

A cluster can define any number of indexes.

Type Type

Note: Type in Elasticsearch 6.0 has been abandoned. No longer support multiple types

Type is used to classify a logical / partition on the index, which allows us to store different types of documents in an index, for example, user type, blog type. However, after 6.0, does not support creating multiple Type in a type of, and, Type type concept will be in the next version, 7.0 is completely removed.

Document Document

Documentation is the smallest unit of information in Elasticsearch be indexed. For example, you can represent a consumer with a document, a document represents a commodity, there is a document that represents an order. Documentation is JSON format, which is a very common data format.

In an index / type, we can save any number of documents. It is noted that, although the physical sense, the document is stored in the index, but the current version (6.5), save a document must be specified index / type.

Fragmentation and replica Sharps & Replica

There is no doubt that the amount of data stored in an index may exceed hardware limitations of a single node. For chestnuts, assuming a one billion documents containing the index will take up disk space 1T, then the disk space of a node may not be enough. Even if there is enough space, there will be a problem in a single node request processing is too slow.

How to solve this problem? Elasticsearch provides a capability index can be split into a plurality of portions, each referred to Sharps, i.e. slice. When you create an index, you can specify the number of index fragments, each fragment actually is an independent, full-featured "index index", and may exist on any node in the cluster.

Fragmentation is very important for two reasons:

  • It makes clusters can achieve the level of capacity expansion
  • It allows us to realize the parallel operation between the different fragments (plurality of nodes), to improve system performance / throughput.

So, do you think, how fragmentation on the distribution of the index it? How to find a search request and aggregate search results different fragments of it? In fact, Elasticsearch have to help us solve these problems. And this process we are also visible.

In a network / cloud environment, an exception may occur at any time, for example, in some cases, slice / node suddenly stops working, or directly from the cluster directly disappeared. Thus, fault tolerance is very necessary and very important. Based on this, Elasticsearch provides the ability to create a copy of slices, each slice you can set to create one or more copies.

A copy of the importance to introduce the next two points.

  • Availability, if a node / slice hung up, can still work normally. Therefore, be sure to pay attention to, and copies of the master slice sliced ​​never be allocated on the same node.
  • High performance, operations can be executed in parallel on multiple copies, this can improve your search performance and improve the throughput of the search.

Briefly, the index can be split into a plurality of slices, and a 0 index may be copied or more parts. Once replicated copies, each index will have a master slice (the original fragment, the other copies are copied from the slice) and a copy of the slice (primary slice copy).

Index fragmentation and the number of copies can be defined when the index is created. However, after the index is created, you can also change the number of copies of dynamic change at any time. By _shrink and _split two interfaces, you can also change the number of fragments index has been created, but this is not a small thing, it is best when creating an index, you set the correct number of fragments.

By default, Elasticsearch will allocate five main fragmentation and five copies of each index, which means that if you have at least two nodes of the cluster, then the default each index will have five main fragments and 5 copies, i.e. containing a total of 10 slices.

Reproduced in: https: //juejin.im/post/5cfcbedce51d45775a7002e7

Guess you like

Origin blog.csdn.net/weixin_33694172/article/details/91417209