The following table with ES Elastic Search
0 Github
1 face questions
Talk about the principle of the distributed architecture of ES
2 test sites analysis
In this search, lucene once the most popular search library.
Before the industry generally ask a few years, you know lucene it? You know the principle of inverted index it?
But now do not ask, because now basically project-based distributed lucene search engine - ElasticSearch.
Search now distributed basic Internet has become the standard system, which is particularly popular is the ES, a few years ago for general use solr. But recently, a substantial majority of enterprises and projects are turning ES.
So Internet interview, distributed search engine will certainly talk with you, we will certainly talk to the ES!
If the interviewer asks you a question first, do you usually ask es distributed architecture can tell us about it? Look at you a basic understanding of a distributed search engine architecture.
3 Comments
ES design philosophy is distributed search engine, but it is still based on the underlying lucene in. 核心思想
It is to start the process of multi-ES instances on multiple machines, composed ES cluster.
3.1 The basic unit
ES stored data 基本单位
is indexed .
For example, now you want to store a number of orders in the ES data, you should create an index in the ES order_idx
, all order data will be written to the index.
On a conceptual index is almost the equivalent of MySQL in a table.
index -> type -> mapping -> document -> field。
3.2 Examples
For ease of understanding, I am here to be a analogy. Remember, only the analogy! Never equal!
MySQL in the index corresponds to a table;
and type analogy with MySQL can not go;
an index where you can have multiple type, each type of fields are similar, there are slight differences.
Suppose an order index, specialized storage order data.
Like saying you build tables in MySQL
- Some orders are orders for physical goods, such as a piece of clothing, a pair of shoes
- Some orders are orders for virtual goods, such as game cards, prepaid recharge
The two most orders fields are the same, but there are still a small part of the field is slightly different.
Similarly, ES will be in the order index, built two type
- It is a type of physical goods orders
- It is a type of virtual goods orders
The two most type field is the same, a small part of the field is not the same.
In many cases, a index was probably a type, but it does say that if there is a multiple type of index case
mapping types
This concept has been completely removed ElasticSearch 7.X, described in detail with reference to official documents
You can think of a category index table, specific for each type represents a table in MySQL.
Each type has a mapping, assuming a specific type of a table, index represents more of a type belong type, and is this type of mapping table definition structure .
you create a table in MySQL, be sure to define the table structure, which has what fields, each field is what type.
in fact you an index to type in the inside write a data, called a document;
a document on a table similar to MySQL in a row;
each document multiple field;
each field represents the value of a field in the document.
3.3 Structure Principle
You build an index, the index is in turn split into a plurality of shard
each shard data storage section.
Split into a plurality of shard is advantageous
- Support horizontal expansion
, such as the amount of data you 3T, 3 Ge shard, data on each shard 1T, if the increase in volume of data now to 4T, how extended?
so easy! 4 shard a new index, the data import
- Improve performance
data across multiple Shard, i.e., multiple servers, all operations are performed in parallel distributed across multiple machines, improve the throughput and performance of the system.
Then is this shard of data actually have more than one backup, that is, each shard
has a primary shard
responsible for writing data, there are several replica shard
. primary shard
After writing the data, the data will be synchronized to several other replica shard
middle.
By replica scheme, each shard has multiple data backup.
Even if a node goes down, as well as data on other nodes to meet the high availability.
3.4 Main characteristics from
ES multiple nodes of a cluster, a node will be automatically elected as the master node;
master node is responsible for some administrative duties, such as maintaining index metadata, switching primary shard
and replica shard
identity and so on;
if the master node goes down, it will re-elect a node is master.
If not on the master node goes down, the master node makes the node is down primary shard
identity transferred to other available nodes replica shard
.
Then if you fix the node that is down, restart, master control node will be missing replica shard
assignment back, and subsequent modification of class synchronization of data, so that the cluster back to normal.
More simply, if a non-master node goes down, then on the node primary shard
does not have gone thing.
Well, master will make primary shard
corresponding replica shard
(on other nodes) switch to primary shard
node downtime to be repaired, node after the repair is no longer primary shard
, but rather replica shard
.
These are the basic search engine ElasticSearch as a distributed architecture design.
reference
"Java Engineer interview assault Season 1 - China huperzine teacher"
Dry resources more please pay attention to the public JavaEdge No.