Elasticsearch

Introduction

Elasticsearch (ES) is a construct based on Lucene open source, distributed, RESTful full-text search engine interface. Elasticsearch or a distributed document database, where each field can be indexed, and the data for each field can be searched, ES can scale to hundreds of server storage and handling PB-level data. Can be stored in a very short period of time, search and analyze large amounts of data. As is usually the core of the engine in case of a complex scene search.

What do Elasticsearch

  1. When you run an online store, you can let your customers search for you sell goods. In this case, you can use ElasticSearch to store your entire product catalog and inventory information, provide customers with accurate search can recommend related products to customers.
  2. When you want to collect log data or transaction, you need to analyze and mine the data, looking for trends, statistics, summary, or abnormal. In this case, you can use Logstash or other tools to collect data, which caused when data is stored in ElasticsSearch. You can search and summarize these data, find any information you are interested in.
  3. For programmers, the more famous cases is GitHub, GitHub search is based on ElasticSearch built in github.com/search page, you can search for items, users, issue, pull request, as well as the code. A total of 40 to 50 index database, one for each data indexing website to track. Although the index only the main branch of the project (master), but this is still a huge amount of data, including 2 billion indexed documents, 30TB index file.

Elasticsearch basic concepts

  Near Realtime (NRT) almost in real time

Elasticsearch is a near real-time search platform. Meaning, from the index a document to this document can be searched just a little bit of delay, this time is generally millisecond.

  Cluster Cluster

A cluster is a collection of one or more nodes (servers), and these nodes together to save the entire data, and provides a joint index and search capabilities on all nodes. A cluster is determined by a unique cluster ID, and specify a cluster name (the default is "elasticsearch"). The cluster name is very important because this node can join the cluster by cluster name, a node can only be part of the cluster.

Be sure not to use the same cluster name in a different environment, it may cause the cluster nodes wrong connection. For example, you can use the logging-dev, logging-stage, logging-prod, respectively, for the development stage products, production clusters recorded.

  Node node

Node is a single server, which is part of the cluster can store data, index, and search and participate in the cluster. Like a cluster, the node name defaults to a random Universally Unique Identifier (UUID), assigned to the node determined at startup. If you do not want the default, you can define any node name. The name is very important for the management, the purpose is to determine your web server corresponds to your ElasticSearch cluster nodes.

We can connect to a specific cluster by cluster name configuration node. By default, each node is set to join named "elasticSearch" clusters. This means that if you start multiple nodes on the network, assuming they can find each other will automatically form and join a cluster called "elasticsearch" of.

In a single cluster, you can have as many nodes. In addition, if "elasticsearch" in the same network, no other node is running, from the default of a single node will form a new single-node cluster called "elasticsearch" of.

  Index Index

The index is a collection of documents with similar characteristics. For example, customers can provide data index, the establishment of another index for the catalog, and the establishment of another index for the order data. Indexed by name (must be all lowercase) logo, the name of which is used in the execution of a document indexing, search, update and delete operations when the index references. In a single cluster, you can define as many indexes.

  Type Type

In the index, one or more types may be defined. Type is a logical category index / partition, semantic entirely up to you. In general, the document type is defined as having a common set of fields. For example, suppose you run a blog platform, and all data is stored in an index. In this index, you can user-defined data type, the other type is defined as blog data, and annotation data for the definition of another type.

  Document Document

Documentation is the basic unit of information that can be indexed. For example, you could provide a document for a single customer, single product to provide another document, as well as a single order to provide another document. This document is a representation of the JSON (JavaScript Object Notation) format, which is a very common Internet data exchange format.

In the index / type, you can store as many documents. Note that although the document physically reside in the index, in fact, the document must be assigned to the type of index or index.

  Shards & Replicas fragmentation and copy

Index can store large amounts of data, which may exceed the limit of a single hardware node. For example, billions of files take up disk space 1TB single indicator may not be suitable for a single node or disk may be too slow service search request from only a single node.

To solve this problem, Elasticsearch to provide a breakdown of your ability to index into multiple blocks called slices. When you create an index, you can simply define the number of slices you want. Each fragment is itself a fully functional, independent "index", you may be hosted any node in the cluster.

  Shards slice importance mainly in the following two characteristics:

  1. Slice level allows you to split or scaled content size
  2. Fragmentation allows you to operate in parallel and allocated fragments (possibly multiple nodes) to improve the performance of fragments / throughput are summarized in this mechanism as well as a distributed file search request is managed entirely by ElasticSearch, for users It is transparent.

On the same cluster network or cloud environment, the failure will appear any time, with a failover mechanism to prevent fragmentation and node offline for some reason or disappear is very useful, and is highly recommended. To this end, Elasticsearch allows you to create one or more copies of your index fragments into the so-called copies or reproductions called fragmentation, referred Replicas.

  Replicas of importance mainly in the following two features:

  1. Copy of the failed node or fragment provides high availability. To this end, it is necessary to note that a copy of the slice is not allocated in the same node as the original or master slice, a copy is replicated from the primary slice over there.
  2. Copies allow users to expand your search volume or throughput because the search can be performed in parallel on all copies.

Compare ES basic concepts of relational databases

ES concept Relational Database
Index (Index) to support full-text search Database (database)
Type (type) Table (Table)
Document (document), different document can have a different set of fields Row (rows)
Field (field) Column (data columns)
Mapping (map) Schema (mode)

ES API

The following example uses the curldemonstration.

Health check

curl -X GET 127.0.0.1:9200/_cat/health?v

Output:

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1564726309 06:11:49  elasticsearch yellow          1         1      3   3    0    0        1             0                  -                 75.0%

Query the current cluster es all indices

curl -X GET 127.0.0.1:9200/_cat/indices?v

Output:

health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_task_manager LUo-IxjDQdWeAbR-SYuYvQ   1   0          2            0     45.5kb         45.5kb
green  open   .kibana_1            PLvyZV1bRDWex05xkOrNNg   1   0          4            1     23.9kb         23.9kb
yellow open   user                 o42mIpDeSgSWZ6eARWUfKw   1   1          0            0       283b           283b

Creating an index

curl -X PUT 127.0.0.1:9200/www

Output:

{"acknowledged":true,"shards_acknowledged":true,"index":"www"}

Delete Index

curl -X DELETE 127.0.0.1:9200/www

Output:

{"acknowledged":true}

Insert Record

curl -H "ContentType:application/json" -X POST 127.0.0.1:9200/user/person -d '
{
	"name": "dsb",
	"age": 9000,
	"married": true
}'

Output:

{
    "_index": "user",
    "_type": "person",
    "_id": "MLcwUWwBvEa8j5UrLZj4",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 3,
    "_primary_term": 1
}

You can also use the PUT method, but need to pass id

curl -H "ContentType:application/json" -X PUT 127.0.0.1:9200/user/person/4 -d '
{
	"name": "sb",
	"age": 9,
	"married": false
}'

Retrieval

Search Syntax Elasticsearch rather special, using the GET method to carry query JSON format.

Full Search:

curl -X GET 127.0.0.1:9200/user/person/_search

Conditional Search:

curl -H "ContentType:application/json" -X PUT 127.0.0.1:9200/user/person/4 -d '
{
	"query":{
		"match": {"name": "sb"}
	}	
}'

ElasticSearch returns a default result up to 10, can be set as the number of results returned by the following example the size field.

curl -H "ContentType:application/json" -X PUT 127.0.0.1:9200/user/person/4 -d '
{
	"query":{
		"match": {"name": "sb"},
		"size": 2
	}	
}'

Go operation Elasticsearch

elastic client

We use third-party libraries https://github.com/olivere/elastic to connect ES and operate.

Note that you download the same version of the ES client, for example, we use here is the ES version 7.2.1, then we have to download the client corresponding to github.com/olivere/elastic/v7.

Use go.modto manage dependence:

require (
    github.com/olivere/elastic/v7 v7.0.4
)

A simple example:

package main

import (
	"context"
	"fmt"

	"github.com/olivere/elastic/v7"
)

// Elasticsearch demo

type Person struct {
	Name    string `json:"name"`
	Age     int    `json:"age"`
	Married bool   `json:"married"`
}

func main() {
	client, err := elastic.NewClient(elastic.SetURL("http://192.168.1.7:9200"))
	if err != nil {
		// Handle error
		panic(err)
	}

	fmt.Println("connect to es success")
	p1 := Person{Name: "rion", Age: 22, Married: false}
	put1, err := client.Index().
		Index("user").
		BodyJson(p1).
		Do(context.Background())
	if err != nil {
		// Handle error
		panic(err)
	}
	fmt.Printf("Indexed user %s to index %s, type %s\n", put1.Id, put1.Index, put1.Type)
}