Elasticsearch conceptual understanding

Official documents address

Filebeat:

https://www.elastic.co/cn/products/beats/filebeat

https://www.elastic.co/guide/en/beats/filebeat/7.1/index.html

Logstash:

https://www.elastic.co/cn/products/logstash

https://www.elastic.co/guide/en/logstash/7.1/index.html

Kibana:

https://www.elastic.co/cn/products/kibana

https://www.elastic.co/guide/en/kibana/7.1/index.html

Elasticsearch:

https://www.elastic.co/cn/products/elasticsearch

https://www.elastic.co/guide/en/elasticsearch/reference/7.1/index.html

Elastic Chinese community:

https://elasticsearch.cn/

 

The client node

When the master node and the data node configuration is set to false when the node can handle routing requests, process search, index distribution operation, essentially the client node performance intelligent load balancer.

Separate the client node in a relatively large cluster is useful, who coordinates the master node and the data node, the client node can be added to the cluster state of the cluster, the request can be routed directly to the state of the cluster.

 

Data Node

The main node data storage nodes of the index data, the document mainly CRUD operations, the polymerization operation. Data node cpu, memory, IO higher,

Need to monitor the status of data nodes, when not enough resources, the need to add a new node in the cluster at the time of optimization.

 

Main qualifications Node Description

The main responsibilities of the master node is qualified and cluster operations-related content, such as creating or deleting indexes, which track is part of a cluster node, and decide which slice allocated to the relevant node.

Stable master node of a cluster of health is very important. Any one of the nodes in the cluster are likely to be selected as the master node by default.

Index data and search queries and other operations will take up a lot of cpu, memory, io resources, in order to ensure the stability of a cluster of separate master node and data node is a better choice.

 

Index (index)

In the ES index is a collection of documents (that is a log), the index is a collection of documents with similar characteristics, a series of document indexes are somewhat similar set of attributes, indexes with respect to the relational database of library.

 

Fragment (Shard)

Because ES is a distributed search engine, so the index usually broken down into different parts, and these data distributed in different nodes is fragmented. ES automatic management and organizational fragmentation, and to slice data when necessary rebalancing allocation, indexing and query are specific shard at work. shard including primary shard and a replica shard, when writing data, first wrote primary shard, then synchronized to the replica shard, queries, primary and replica act as the same effect.

replica shard can have multiple copies, you can not, there is a replica shard has two roles, one disaster, if the primary shard hung up, data is not lost, the cluster can still work normally; second is to improve performance, because the replica and primary shard can process the query.

When a large amount of data storage space requirements, the index exceeds the limit of disk capacity of a single node, or there is a single node processing speed is slow. To solve these problems, the data index elasticsearch be cut into a plurality of fragments (Shard), a portion of the data piece stored for each index, distributed over different nodes. When you need to query the index, ElasticSearch send a query to each relevant fragment, after the merger query results, this process is transparent to the application ElasticSearch, user perception of the existence of the fragmentation.

 

Type (TYPE)

In one index can be defined one or more types. Type or category is a logical partition entirely up to you. Normally, a classification is given in the document to a group having a common field. The operation and maintenance ttlsa all data generated in a time into a single index called the logstash-ttlsa, with respect to the type of relational database tables. Where one index can define multiple type, but generally only with a usage type. (7.x version has been excluded type)

 

Copy (replica)

ES default is an index created five main fragments, and create a copy of each slice. That Each index consists of five main fragments costs, and each main fragment are appropriate to have a copy. in practice, the index data stored may exceed the limit of a single hardware node. As a one billion documents need 1TB of space may not be stored on disk for a single node, or request too slow from a single node search. To solve this problem, elasticsearch provide the index sheet into a plurality of sub-functions. When creating an index, you can define the desired number of slices. Each slice is a full-featured independent index, it may be located on any node in the cluster.

The main reason two slices:

  1. Extended horizontally split, increase the storage amount
  2. Parallel distributed across slicing, improve performance and throughput

How distributed fragmentation mechanisms and search for the requested document summary is entirely elasticsearch control, which is transparent to the user.

Other network problems, etc. may be unexpected problem at any time, for robustness, it is strongly recommended to have a failover mechanism, regardless of fault nodes, or to prevent the slice is not available.

To this end, elasticsearch let us replicate the index in one or more fragments, called slices or a copy of a copy.

In fact, the full name of the master slice slice, referred to as fragmentation. Relative to the primary slice is a copy of copy is one or more duplicate version of the master slice (or copy), the duplicate version (the copy) can be referred to as a copy fragments can be directly called copy. When the master slice is lost, a copy of the cluster can be upgraded to the new master slice.

 

Document (document)

A document is the basic unit that can be indexed. JSON document is represented, which is a ubiquitous Internet data exchange format. Analogies row database record (record), document Elasticsearch is in a JSON object, including zero or more field. Document is a basic unit of information that can be indexed. Documentation is expressed in JSON format.

In type, you can store multiple documents on demand.

While a document is located on a physical index, in fact, a document must be assigned a type and indexed in an index.

Documents relative to the columns of a relational database.

 

mapping

Analogy schema concept of relational databases, mapping defines the index of the type. Define the mapping can be displayed, it may be generated automatically when the document is indexed, if there is a new field, elasticsearch automatically estimated and added to the type field of the mapping.

Elastic and relational database concepts contrast

ElasticSearch

RDBMS

Index (index)

Database (database)

Type (type)

TABLE (table)

Document (document)

Line (row)

Field (field)

Column (column)

Mapping (mapping)

Table structure (schema)

Full-text index

index

Query DSL

SQL

GET

select

PUT/POST

update

DELETE

delete

Guess you like

Origin www.cnblogs.com/ghl1024/p/12080598.html