The concept ElasticSearch (referred to as ES) of

A knowledge ElasticSearch

1.1.ElasticSearch

  • ES that in order to solve the shortage of native Lucene use, optimizing Lucene call mode and perform a search program distributed high-availability cluster, its first version appeared on GitHub in February 2010 and quickly became the most popular one of the project.
  • Java ES also be used to develop and use Lucene as its core to achieve all index and search function, but its purpose is to hide the complexity of Lucene by a simple RESTful API, allowing full-text search easier.
  • ElasticSearch simplifies the use of full-text search lucene, while increasing the distributed nature, it makes building large-scale distributed full-text search very easily.

1.2. What is ES?

  • ES lucene solve the problem, the call of trouble, ES supports distributed and clustered
  • ES implementation of distributed file storage
  • ES implementation of distributed search
  • Usually the amount of data processing level data PB
  • By simply calling easy to get restful style
  • Easy to get started, open a bottle of the drink can achieve the effect of

ES there is no competition?

Solr he is a competitor of the difference between ES:

  • solr can do full-text search, but he is a heavyweight framework, not only do full-text search, you can also do a lot of other things
  • solr in real-time search can not match es
  • solr can support both distributed and es
  • solr can support nosql

The difference 1.3.ES and lucene

Lucene:

  • Only supports Java
  • Non-distributed, the index only in the local directory
  • Use very complex
  • Use small projects

1.4.ES related concepts

1.4.1.Near Realtime(NRT)

In near real time, meaning two, the data from the write data may be searched to have a small delay (about 1 second); es based search and analysis can be achieved in seconds

1.4.2.Index: index Library

There are a bunch of documents containing similar data structure, such as a customer can have an index, commodity classification index, orders index, the index has a name. Index contains a lot of document, an index to represent a class of similar or identical document. For example, to create a product index, commodity index, which might store all product data, all of the merchandise document.

1.4.3.Type: Type

Where each index can have one or more type, type is a classification index the logical data, document under a type, have the same Field, such as blog system, an index, a user can define a data type, blog data type, comment data type.

1.4.4.Document&field

Documents, es smallest unit of data in a document can be a customer data, a commodity classification data, an order data, usually expressed in JSON data structure, type in each index, the can go to store multiple document. A document which has a plurality of field, each field is a data field.

1.4.5.Cluster: Cluster

Includes a plurality of nodes, each node belongs to which cluster is determined by a configuration (cluster name, the default is elasticsearch), for small and medium sized applications, the beginning of a cluster node on a normal

1.4.6.Node: Node

A node in the cluster, the node also has a name (the default is randomly assigned), node name is very important (in the implementation of the operation and maintenance management operations), "elasticsearch" cluster default node to join a name, if the direct start a bunch of nodes, then they will automatically form a cluster elasticsearch, of course, a node can form a cluster elasticsearch

1.4.7.shard (fragment)

A single machine can not store large amounts of data, es data can be cut into a plurality of index shard, distributed across multiple storage servers. With shard can scale out to store more data, so that operations such as search and analysis distributed to multiple servers to perform up to enhance the throughput and performance. Each shard is a lucene index.

1.4.8.replica (replica)

Subject to any server failure or downtime, then shard might be lost, so you can create multiple copies of each replica shard. replica may provide backup service when shard failure, to ensure data is not lost, a plurality of replica may further improve the throughput and performance of the search operation. primary shard (indexing a set, can not be modified, default 5), replica shard (to modify the number of the default one), each index 10 Shard default, Primary Shard 5, 5 replica shard, the smallest high available configurations, a server 2.

Published 33 original articles · won praise 0 · Views 405

Guess you like

Origin blog.csdn.net/weixin_45737653/article/details/104824442