Es principle of distributed architecture can say about it (es is how to achieve distributed ah)?

Interviewer psychological analysis

In this search, lucene the most popular search library. A few years ago the industry generally ask, you know lucene it? You know the principle of inverted index it? Now already out, because now many projects are based on the direct use of distributed lucene search engine - ElasticSearch, referred to as es.

And now distributed search has become the standard most basic Internet industry Java system, which is especially popular es, es time a few years ago did not fire, we generally use solr. But this year the basic and most enterprises are turning es the project.

So Internet interview, you will definitely talk with distributed search engine, it will talk es, if you do not know, then you are really out of.

If the interviewer asks you a question first, do you usually ask es distributed architecture can tell us about it? Look at you a basic understanding of a distributed search engine architecture.

Face questions analysis

ElasticSearch design concept is a distributed search engine, but it is still on the bottom of lucene. The core idea is to start multiple processes es instances on multiple machines, es form a cluster.

es stored in the data base unit is an index , for example, now you want to store some data in order es, you should create an index es in  order_idxall order data are written to go inside this index, an index is almost the equivalent of mysql in a table.

index -> type -> mapping -> document -> field。

Well, in order to be a more straightforward presentation, I am here to be a analogy. But remember, do not equate analogy only to facilitate understanding.

mysql in the index corresponds to a table. The type can not compare with mysql go, a index where you can have multiple type, each type of fields are similar, but there are some slight differences. Suppose there is a index, it is the order index, which is dedicated to put order data. Like saying you built in mysql table, some orders are orders for physical goods, such as a piece of clothing, a pair of shoes; some orders are orders for virtual goods, such as game cards, prepaid recharge. These line on most of the fields are the same, but a small part of the field there might be some slight differences.

It will be in the order index, the construction of two type, is a type of physical goods orders, is a type of virtual goods orders, the two most fields are the same type, a small number of fields are not the same.

In many cases, a index was probably a type, but it does say that if there is an index multiple type of situation ( note , mapping typesthis concept has been completely removed ElasticSearch 7.X, details can refer to the official documentation ), you can think of a category index table, specific for each type represents a table in mysql. Each type has a mapping, if you think that is a specific type of a table, index represents a type of multiple type belong, and the mapping is this type of table structure definition , you create a table in mysql, certainly to define the table structure, which has what fields, each field is what type. In fact you go to the index in a written inside a data type called a document, a document on behalf of the mysql row in a table, each document has multiple field, each field represents in this document a value field.

es-index-type-mapping-document-field

You engage in an index, the index can be split into a plurality of  shard, for each data storage section Shard. Split multiple shard is good, first support horizontal expansion , such as the amount of data you are 3T, 3 Ge shard, data on each shard 1T, if the amount of data now increased to 4T, how extended, very simple, again build a shard index 4, guided into the data; the second is to improve the performance , a plurality of distributed data shard, i.e., multiple servers, all operations are performed in parallel distributed across multiple machines, improved throughput volume and performance.

Then is this shard of data actually have more than one backup, that each has a shard  primary shard, responsible for writing the data, but there are a few  replica shard. primary shard After writing the data, the data will be synchronized to several other  replica shard up.

is-cluster

Through this replica of the scheme, each shard have multiple backups of data, if a machine goes down, it does not matter ah, there are other copies of data on other machines do. Availability of it.

es multiple cluster nodes, one node will be automatically elected as the master node, the master node is actually doing some administrative work, such as maintenance index metadata, responsible for switching primary shard and a replica shard status. If the master node goes down, then the node will be re-elected as a master node.

If the non-master node goes down, it will be by the master node, so that the identity of the primary shard that is down to the replica shard node metastasis on other machines. Then if you fix that machine downtime, after restart, master control node will be missing replica shard distribution in the past, like data synchronization of subsequent modification, so cluster back to normal.

Said more simply, that if a non-master node is down. Then the primary node on this shard is not gone. Well, master primary shard will make corresponding replica shard (on another machine) is switched to the primary shard. If a node goes down the machine repaired, after the repair is no longer a primary shard, but the replica shard.

In fact, the above is ElasticSearch as distributed search engine, a basic architecture.

 

Source: doocs open source

Original: https://github.com/doocs/advanced-java/blob/master/docs/high-concurrency/es-architecture.md

Published 737 original articles · won praise 65 · views 90000 +

Guess you like

Origin blog.csdn.net/qq_41723615/article/details/104252702