Search Engine 2

Interview questions

es distributed architecture principles can say about it ( es is how to achieve distributed ah)?

Interviewer psychological analysis

In this search, Lucene is the most popular search library. A few years ago the industry generally ask, you know lucene it? You know the principle of inverted index it? Now already out , because now many projects are directly based on lucene distributed search engine - elasticsearch , referred to as es .

And now distributed search has become the most basic Internet industry Java standard system, which is especially popular es , a few years ago es time did not fire, we generally use Solr . But this year a substantial majority of enterprises and projects are turning es up.

So Internet interview, you will definitely talk with distributed search engine, it will talk es , if you do not know, then you are really out of.

If the interviewer asks you a question first, do you usually ask es distributed architecture can tell us about it? Look at you a basic understanding of a distributed search engine architecture.

Face questions analysis

ElasticSearch design concept is a distributed search engine, but it is still based on the underlying lucene of. The core idea is to start multiple on multiple machines es process instance, formed a es cluster.

es stored in the data base unit is an index , for example, you now want es stores some data in order, you should es create an index  order_idx , all order data are written on the index inside, a similar index is the equivalent of mysql in a table.

index -> type -> mapping -> document -> field

Well, in order to be a more straightforward presentation, I am here to be a analogy. But remember, do not equate analogy only to facilitate understanding.

index equivalent to mysql in a table. The type can not tell mysql contrast go, a index where you can have multiple type , each type of fields are similar, but there are some slight differences. Suppose there is a index , it is the order index , which is dedicated to put order data. Like saying you mysql Jianzhong table, some orders are orders for physical goods, such as a piece of clothing, a pair of shoes; some orders are orders for virtual goods, such as game cards, prepaid recharge. These line on most of the fields are the same, but a small part of the field there might be some slight differences.

It will be in the order index , the construction of two type , one is physical goods order type , is a virtual goods orders type , both type most of the fields are the same, a small part of the field is not the same.

In many cases, a index was probably a type , but if that is indeed a index , there are multiple type situation ( note , Mapping types concept in ElasticSearch 7.X has been completely removed, a detailed description can refer to the official documentation ), you may think that index is a list of categories, specific to each type represents mysql a table. Each type has a mapping , if you think a type is a specific table, index represents more than type a type belong, and the mapping is this type of table structure definition , you mysql create a table, certainly to define the table structure, which has what fields, each field is what type. In fact you goindex where a type written inside a piece of data, called a document , a document on behalf of the mysql row in a table, each document has multiple field , each field on behalf of this document a field of value.

 

You engage in an index, the index can be split into a plurality of  shard , each shard storage section data. Split multiple shard is good, first support horizontal expansion , such as the amount of data you are 3T , 3 Ge shard , each shard to 1T data, if the amount of data now increased to 4T , how extended, very simple, again a built . 4 th shard index guided into the data; the second is to improve the performance , a plurality of distributed data shard , i.e., multiple servers, all operations are performed in parallel distributed across multiple machines, improved throughput volume and performance.

 

Then is this shard of data actually have more than one backup, that is, each shard has a  Primary shard , responsible for writing the data, but there are several  Replica shard . primary shard  after writing data, the data will be synchronized to several other  replica shard  up.

 

Through this replica of the scheme, each shard of data have multiple backups, if a machine goes down, it does not matter ah, there are other copies of data on other machines do. Availability of it.

 

es cluster multiple nodes, one node will be automatically elected as the master node, the master node is actually doing some administrative work, such as maintenance index metadata, responsible for switching primary shard and a replica shard status. If the master node goes down, then the node will be re-elected as a master node.

 

If the non- master node goes down, it will be by the master node, so that downtime on the node primary shard identity transferred to other machines on the Replica Shard . Then if you fix that machine downtime, after restart, Master node controls missing replica shard distribution in the past, like data synchronization of subsequent modification, so cluster back to normal.

 

Said more simply, that if a non- master node is down. So on this node primary shard is not gone. Well, Master will make primary shard corresponding Replica Shard (on another machine) is switched to the primary shard . If a node goes down the machine repaired, after the repair is no longer a Primary Shard , but Replica Shard .

 

In fact, the above is ElasticSearch as distributed search engine, a basic architecture.

 

 

Guess you like

Origin www.cnblogs.com/lingboweifu/p/11897388.html