elasticsearch Beginners

Preface:

Getting to be on the best documentation, non- elasticsearch authoritative guide , and none other than official development documents, I just based on these two documents, records some of the key knowledge and understanding of their own.

Why do we use elasticsearch, or to solve the problem?

  1. mysql like query
  2. Full Text Search service
  3. Too many database fields, query is too slow, the index is no way to do optimization

For video site performance bottlenecks, video resource retrieval, according to the director, actors, description, title and other fields to retrieve, mysql solution is to use fuzzy matching, but if the video is too large, a sharp decline in retrieval performance. Then you need to elasticsearch defuse the crisis.


basic concept

Understand the basic concepts of [elasticsearch cluster cluster, node node, the index index, type type, document document, fragmentation and replication shards and replicas]

traditional relational database contrast, have a more intuitive understanding.

Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices   -> Types  -> Documents -> Fields

Do not use an index for traditional database to understand elasticsearch the index, elasticsearch the index is more like the mysql database.

Laijiangjiang focused fragmentation and replication

An index can store a large amount of data exceeds a single node hardware limitations. For example, a document with an index 1000000000 1TB occupy disk space, and a node may not have any such large disk space to store a single node or processing the search request, the response is too slow.

To solve this problem, Elasticsearch provides the ability to index into multiple pieces, and these pieces, called fragments. When you create an index, you can specify the number you want to slice. Each fragment is itself a fully functional and independent "Index", the "Index" may be placed on any node in the cluster.

Fragmentation is important, there are two reasons:

  1. Allows you to split horizon / expand your content capacity
  2. Allows you slicing (located on the plurality of nodes) distributed on, in parallel operation, thereby improving performance / throughput
    as to how the distribution of a slice, which documents how the search request back to the polymerization, is managed entirely by Elasticsearch and for you as a user, these are transparent.

In a network / cloud environment, the failure can happen at any time. Case is offline for some reason in a slice / or disappearance of nodes, the failover mechanism is very useful and highly recommended. To this end, Elasticsearch allows you to create slices in one or more copies, replicate these copies is called fragmentation, or directly called replication.

Copy is important for two main reasons:

  1. In the case of fragmentation / node failure, the replication provides high availability. Copy on the same node is very important fragmentation is not placed in the original / main fragmentation.
  2. Because the search can be run in parallel on all replication, replication can expand your search volume / throughput

In summary, each index may be divided into a plurality of slices. An index can also be copied 0 times (ie no copy) or more times. Once copied, there is a master index of each slice (slice as the copy source) and copy slice (primary slice copy). Fragmentation and number of copies can be specified when the index was created. After the index is created, you can dynamically change the number of copies at any time, but you can not change the number of slices.
By default, Elasticsearch each of the five main index assignment fragmentation and a copy. This means that if you have at least two nodes of the cluster, you will have five main index fragmentation and five other copy slices (a full copy), so that each index will have a total of 10 points sheet.


Basic Operations

Here I think we should only need to remember a few key words commonly used on ok

# 查看集群健康状况,其中v参数是为了显示表头
GET /_cat/health?v
# 查看节点列表
GET /_cat/nodes?v
# 查看索引
GET /_cat/indices?v

The operation of the document, then because elasticsearch interactive follow the norms RESTful API, so GET, PUT, POST, DELETE corresponds to request resources, update, create, delete, but be careful, because the document in Elasticsearch is immutable, updated in real terms update operations are covered.

Small chestnuts

Let's create a table of employees

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}
first name Explanation
megacorp Index Name
employee Type Name
1 The employee's ID

Simple search

GET /megacorp/employee/1
GET /megacorp/employee/_search
# 响应内容的hits数组中包含了我们所有的文档。默认情况下搜索会返回前10个结果

Some of the more sophisticated specific operation is not demonstrated, in order to maintain simplicity of this article, because knowledge of the full recording operation can write an article here in passing.

Guess you like

Origin www.cnblogs.com/zenan/p/10954187.html