Breakthrough Java Interview (14) - Distributed Search Engine architecture

The following table with ES Elastic Search

0 Github

1 face questions

Talk about the principle of the distributed architecture of ES

2 test sites analysis

In this search, lucene once the most popular search library.
Before the industry generally ask a few years, you know lucene it? You know the principle of inverted index it?
But now do not ask, because now basically project-based distributed lucene search engine - ElasticSearch.

Search now distributed basic Internet has become the standard system, which is particularly popular is the ES, a few years ago for general use solr. But recently, a substantial majority of enterprises and projects are turning ES.

So Internet interview, distributed search engine will certainly talk with you, we will certainly talk to the ES!

If the interviewer asks you a question first, do you usually ask es distributed architecture can tell us about it? Look at you a basic understanding of a distributed search engine architecture.

3 Comments

ES design philosophy is distributed search engine, but it is still based on the underlying lucene in.
核心思想It is to start the process of multi-ES instances on multiple machines, composed ES cluster.

3.1 The basic unit

ES stored data 基本单位is indexed .
For example, now you want to store a number of orders in the ES data, you should create an index in the ES order_idx, all order data will be written to the index.
On a conceptual index is almost the equivalent of MySQL in a table.

index -> type -> mapping -> document -> field。

3.2 Examples

For ease of understanding, I am here to be a analogy. Remember, only the analogy! Never equal!

MySQL in the index corresponds to a table;
and type analogy with MySQL can not go;
an index where you can have multiple type, each type of fields are similar, there are slight differences.

Suppose an order index, specialized storage order data.
Like saying you build tables in MySQL

  • Some orders are orders for physical goods, such as a piece of clothing, a pair of shoes
  • Some orders are orders for virtual goods, such as game cards, prepaid recharge

The two most orders fields are the same, but there are still a small part of the field is slightly different.

Similarly, ES will be in the order index, built two type

  • It is a type of physical goods orders
  • It is a type of virtual goods orders

The two most type field is the same, a small part of the field is not the same.

In many cases, a index was probably a type, but it does say that if there is a multiple type of index case

mapping typesThis concept has been completely removed ElasticSearch 7.X, described in detail with reference to official documents

You can think of a category index table, specific for each type represents a table in MySQL.
Each type has a mapping, assuming a specific type of a table, index represents more of a type belong type, and is this type of mapping table definition structure .
you create a table in MySQL, be sure to define the table structure, which has what fields, each field is what type.
in fact you an index to type in the inside write a data, called a document;
a document on a table similar to MySQL in a row;
each document multiple field;
each field represents the value of a field in the document.

3.3 Structure Principle

You build an index, the index is in turn split into a plurality of shardeach shard data storage section.
Split into a plurality of shard is advantageous

  • Support horizontal expansion
    , such as the amount of data you 3T, 3 Ge shard, data on each shard 1T, if the increase in volume of data now to 4T, how extended?

so easy! 4 shard a new index, the data import

  • Improve performance
    data across multiple Shard, i.e., multiple servers, all operations are performed in parallel distributed across multiple machines, improve the throughput and performance of the system.

Then is this shard of data actually have more than one backup, that is, each shardhas a primary shardresponsible for writing data, there are several replica shard.
primary shardAfter writing the data, the data will be synchronized to several other replica shardmiddle.


By replica scheme, each shard has multiple data backup.
Even if a node goes down, as well as data on other nodes to meet the high availability.

3.4 Main characteristics from

ES multiple nodes of a cluster, a node will be automatically elected as the master node;
master node is responsible for some administrative duties, such as maintaining index metadata, switching primary shardand replica shardidentity and so on;
if the master node goes down, it will re-elect a node is master.

If not on the master node goes down, the master node makes the node is down primary shard identity transferred to other available nodes replica shard.
Then if you fix the node that is down, restart, master control node will be missing replica shardassignment back, and subsequent modification of class synchronization of data, so that the cluster back to normal.

More simply, if a non-master node goes down, then on the node primary shard does not have gone thing.
Well, master will make primary shardcorresponding replica shard(on other nodes) switch to primary shardnode downtime to be repaired, node after the repair is no longer primary shard, but rather replica shard.

These are the basic search engine ElasticSearch as a distributed architecture design.

reference

"Java Engineer interview assault Season 1 - China huperzine teacher"

Dry resources more please pay attention to the public JavaEdge No.

Guess you like

Origin yq.aliyun.com/articles/706494