Elasticsearch introductory tutorial series -Elasticsearch

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/projim_tao/article/details/102470079

introduction

Elasticsearch is well-known open source distributed search and data processing platform is a distributed Lucene-based, real-time, full-text search system that is stable, reliable, highly available, massively scalable and other characteristics, making Elasticsearch wide range of applications. Especially in combination kit ELK Logstash, Kibana form, is in the log collection and visualization scene is large-scale application.

This article will start from scratch and introduce the core concepts Elasticsearch, installation and basic use, the goal is to be able to read this quick start Elasticsearch.

First, the core concept

Index (index)

An index is a collection of documents with similar characteristics, such as index information of a user, a student achievement index, an index is specified by a name Elasticsearch, the name of lowercase letters.


Type (type)

In one index, one or more types may be defined, refers to a logical classification types on an index, would generally defined as a set of documents having a common field types, such as saving a user to save the index data, as members create a user types, create a type for the average user. Type in Elasticsearch 7.X version of the latest version has been removed.


Document (document)

Is a document that can be indexed Elasticsearch basic information unit, the document in a generic data exchange format JSON representation, stored in the index, an index may be stored in theory, any number of documents.


Fragment (Shards)

An index is theoretically possible to store as many documents, but the limited capacity of a single server practice, can not store all the data. For example, 100 million documents, no less than a single server storage. To address this situation, elasticsearch provides an index of the data stored in the multiple cut into a plurality of functional servers, each slice is a one.
When creating an index of the number of fragments can be specified, the default will be five slices. In general you can not change (the cost of change is too large) after the specified index requires advance planning capacity.
Design slice one hand Elasticsearch have the ability to expand horizontally, on the other hand a plurality of parallel slices can provide query and indexing services, greatly improving system performance.


Copy (replicas)

The system must have a robust high-availability, high-availability replication is Elasticsearch reflected. When a case of a fragmentation problem of dropped calls, there must be a "backup" can fail, this backup is the "copy" slices. Elasticsearch replication allows multiple slices of a slice a master, a copy of the default slices. Of particular note is that fragmentation can not be replicated with the main fragment in the same node, otherwise we lose the ability to highly available.

In summary, replication of slices:

  • Provide high availability Elasticsearch
  • Copy sheets are provided in parallel a plurality of sub-search function, improve the search capabilities Elasticsearch.

Cluster (cluster)

Elasticsearch cluster consists of one or more nodes, all shared data storage and search capabilities. Cluster consists of a unique name to distinguish, the default is "elasticsearch", the cluster node joins the cluster through the entire unique name.


Node (node)

Elasticsearch node is part of a cluster, each node also has a unique name, as long as multiple other nodes in the same network node, a node can join a cluster, cluster by cluster name specified in the perception of each other.


Near real-time (near real-time)

Elasticsearch may be there will be a short delay from the storage document to document index inquiry, the delay time is generally less than 1 second, so is the near real-time.

Second, the installation

2.1 Download the installation package Elasticsearch

Download Elasticsearch install the package and decompress, paper 6.5.2 version, for example.

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.2.tar.gz
$ tar -zxvf elasticsearch-6.5.2.tar.gz

2.2 modify the configuration file (optional)

Modify the configuration file is an optional step, but in order to show Elasticsearch some basic configuration items, we can be selectively configured, you can also skip the default configuration.

#集群名字,elasticsearch使用集群名字来加入某一个集群,默认为elasticsearch
cluster.name: my-application
#节点名字
node.name: node-1
node.attr.rack: r1
#数据存放路径
path.data: /home/elastic/data
#日志存放路径
path.logs: /home/elastic/logs
#对外发布的IP
network.host: 192.168.56.3
#http访问的端口
http.port: 9200
#是否开启xpack安全配置
xpack.security.enabled: false
#添加跨域配置
http.cors.enabled: true

http.cors.allow-origin: "*"

2.3 modify the system parameters

2.3.1 modify the maximum file descriptors and the maximum number of threads configuration

Switch to the root user, modify /etc/security/limits.conf configuration file, add the following content and save it.

#添加如下内容并保存
* soft nofile 65536

* hard nofile 131072

* soft nproc 4096

* hard nproc 4096

The above configuration is run ElasticSearch because there are requirements for maximum file descriptor and the maximum number of threads, the default value of 4096 and 2048 is too small, the absence of the above configuration, the boot process will be reported as an error.

max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[1]: max number of threads [2048] for user [elastic] is too low, increase to at least [4096]

2.3.2 modify parameters max_map_count
open /etc/sysctl.conf configuration file, add the following content and save it, execute sysctl -p command to take effect.

vm.max_map_count=262144

The above configuration also because of the time required by Elasticsearch parameters set too small start will report the following error

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

2.3.3 shut down the system firewall (optional)

$ systemctl stop firewalld.service
$ systemctl status firewalld.service

2.4 Starting Elasticsearch

After the above configuration, you can execute the following command to start Elasticsearch, enter Elasticsearch root directory, execute the following command

$ bin/elasticsearch

If you see the following logs, representatives have a normal start

[2019-01-13T08:41:29,796][INFO ][c.f.s.h.SearchGuardHttpServerTransport] [node-1] publish_address {10.0.2.15:9200}, bound_addresses {[::]:9200}
[2019-01-13T08:41:29,796][INFO ][o.e.n.Node               ] [node-1] started

2.5 verify Elasticsearch

Or use the curl command execution, enter the following URL in your browser, if normal output Elasticsearch cluster information, proof has been operating normally.

$ curl http://192.168.56.3:9200
或者
$ curl http://localhost:9200
{
  "name" : "node-1",
  "cluster_name" : "my-application",
  "cluster_uuid" : "C2ILS_NVRM-S-JPFFsHhUg",
  "version" : {
    "number" : "6.5.2",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "424e937",
    "build_date" : "2018-06-11T23:38:03.357887Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Third, the index operation

Elasticsearch provide a complete set of RESTful API to support a variety of indexing, document, search and other operations. Here we simply create an index, query, and remove as an example to learn how Elasticsearch index.

New Index

Use HTTP PUT method can create a new index, the index is created as a customer's name, pretty parameter represents the response is returned in JSON format easily read.

$ curl -X PUT "localhost:9200/customer?pretty"

The return value is as follows, represents the index has been created successfully.

{
  "acknowledged":true,
  "shards_acknowledged":true
}

Query Index

$ curl http://localhost:9200/_cat/indices?v
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer IpQSnki7S1eQfYH6BgDd0Q   5   1          2            0      7.7kb          7.7kb

Access _cat API using HTTP GET method of indices interface to query the index created in the previous step, you can see there are five primary index customer fragmentation, fragmentation has a copy.


Delete Index

Use HTTP DELETE method to remove an index.

$ curl -X DELETE "localhost:9200/customer?pretty"

The return value is as follows, on behalf of the success has been deleted, you can re-use the query method to query the index, the index can be found has been deleted.

{
  "acknowledged": true
}

Fourth, the operation of the document

Elasticsearch for the operation of the document also includes a number of _doc API, here we use the same Document CRUD example to explain how to operate Document (front step before attempting to create an index that can not be deleted)

Create documents

HTTP PUT method used to add a new Document, as specified below Create Document ID Document 1

$ curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Liu Jintao"
}
'

The return value is as follows, representatives have successfully created the document.

{
  "_index": "customer",
  "_type": "_doc",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 2
}

Query document

Using the HTTP GET method, query by Document ID Document

$ curl -X GET "localhost:9200/customer/_doc/1?pretty"

The return value is as follows, it can be found in the actual content stored on the source field.

{
  "_index": "customer",
  "_type": "_doc",
  "_id": "1",
  "_version": 2,
  "found": true,
  "_source": {
    "name": "Liu Jintao"
  }
}

Modify the document

Using an HTTP POST request to modify a Document, for example, Document name created above the value to "Test Name"

$ curl -X POST "localhost:9200/customer/_doc/1/_update?pretty" -H 'Content-Type: application/json' -d'
{
  "doc": { "name": "Test Name" }
}
'

The return value is as follows, you can find value _version field has changed, to prove our successful update, you can also use the query API to re-query confirmation.

{
  "_index": "customer",
  "_type": "_doc",
  "_id": "1",
  "_version": 3,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_term": 2
}

Delete Document

Use HTTP DELETE method to delete a Document

$ curl -X DELETE "localhost:9200/customer/_doc/1?pretty"

The return value is as follows behalf deleted successfully

{
  "_index": "customer",
  "_type": "_doc",
  "_id": "1",
  "_version": 4,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 3,
  "_primary_term": 2
}

Fifth, data search

Elasticsearch search data is the highlight of the Document, Index, like, Elasticsearch have a special _search API supports search function. Data base search using the HTTP GET method, there are two ways:

  • Use Request URI, the URI query parameters put on

  • Use Request Body, query parameters put under the Request Body ( recommended )


method one

_Search use all the Document under the API, q = * on behalf of customer inquiries index

$ curl -X GET "localhost:9200/customer/_search?q=*&pretty"

The return value is as follows, _shards.total representative of a total of five slices, _shards.successful 5 representatives of five fragments all successful query, the query results segment represents hits, hits.total is representative of a number of eligible Document 1 .

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "customer",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "name": "Liu jintao"
        }
      }
    ]
  }
}

Second way

Carried out using the Request Body way query, use query parameters, matching conditions match_all, results are consistent with the results of the previous step, not repeat them.
This query using the query DSL Elasticsearch grammar, follow-up article will explain in detail this syntax.

$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} }
}
'

more content

Guess you like

Origin blog.csdn.net/projim_tao/article/details/102470079