ELK ② index, create, view, delete, install and use tokenizer, map, create, view, modify, document, add, delete, modify, check and partial update

manage index

Section 1 Index Operations (Create, View, Delete)

1. Create an index library

Elasticsearch adopts Rest style API, so its API is an http request, you can use any tool to initiate an http request

grammar

PUT /索引名称
{
	"settings": {
		"属性名": "属性值"
	}
}


settings: It is the setting of the index library, which can define various attributes of the index library such as the number of fragments and the number of copies, etc. At present, we can not set it and use the default
example

PUT /lagou-company-index

  You can see that the index was created successfully.

2. Determine whether the index exists

grammar

HEAD /索引名称

example

HEAD /lagou-company-index

 

3. View index

The Get request can help us view the relevant attribute information of the index,

Format:

view a single index

grammar

 GET /索引名称

example

GET /lagou-company-index

 

Batch view index

grammar

 GET /索引名称1,索引名称2,索引名称3,...

example

 GET /lagou-company-index,lagou-employee-index

 

view all indexes

method one

 GET _all

way two

 GET /_cat/indices?v

 

Green: All shards of the index are allocated normally.

Yellow: At least one replica has not been assigned correctly.

Red: At least one primary shard has not been allocated correctly.

 

4. Open the index

grammar

 POST /索引名称/_open

 

5. Turn off indexing

grammar

 POST /索引名称/_close

6. Delete the index library

Delete an index using a DELETE request

grammar

 DELETE /索引名称1,索引名称2,索引名称3...

example

Check again, the returned index does not exist

 

Section 2 Install IK tokenizer

2.1 installation

Use the root user to operate! !

Each machine must be configured. After the configuration is complete, you need to restart the ES service

1 Create a new analysis-ik directory under the plugins directory of the elasticsearch installation directory

#新建analysis-ik文件夹
mkdir analysis-ik
 
#切换至 analysis-ik文件夹下
cd analysis-ik
 
#上传资料中的 elasticsearch-analysis-ik-7.3.0.zip
#解压
unzip elasticsearch-analysis-ik-7.3.3.zip
 
#解压完成后删除zip
rm -rf elasticsearch-analysis-ik-7.3.0.zip
 
#分发到其它节点
cd ..
scp -r analysis-ik/ linux122:$PWD
scp -r analysis-ik/ linux123:$PWD

Notice:

-bash: unzip: command not found
yum install -y unzip

2 Restart Elasticsearch and Kibana

#杀死es
ps -ef|grep elasticsearch|grep bootstrap |awk '{print $2}' |xargs kill -9
 
#启动
nohup /opt/lagou/servers/es/elasticsearch/bin/elasticsearch >/dev/null 2>&1 &
 
#重启kibana
cd /opt/lagou/servers/kibana/
/bin/kibana

2.2 Testing

The IK tokenizer has two segmentation modes: ik_max_word and ik_smart modes.

  • ik_max_word (commonly used), will split the text into the finest granularity
  • ik_smart will do the coarsest-grained split

Regardless of the grammar, enter the following request in Kibana to test a wave:

POST _analyze
{
	"analyzer": "ik_max_word",
	"text": "南京市长江大桥"
}

ik_max_word word segmentation mode operation to get the result:

{
	"tokens": [{
		"token": "南京市",
		"start_offset": 0,
		"end_offset": 3,
		"type": "CN_WORD",
		"position": 0
	}, {
		"token": "南京",
		"start_offset": 0,
		"end_offset": 2,
		"type": "CN_WORD",
		"position": 1
	}, {
		"token": "市长",
		"start_offset": 2,
		"end_offset": 4,
		"type": "CN_WORD",
		"position": 2
	}, {
		"token": "长江大桥",
		"start_offset": 3,
		"end_offset": 7,
		"type": "CN_WORD",
		"position": 3
	}, {
		"token": "长江",
		"start_offset": 3,
		"end_offset": 5,
		"type": "CN_WORD",
		"position": 4
	}, {
		"token": "大桥",
		"start_offset": 5,
		"end_offset": 7,
		"type": "CN_WORD",
		"position": 5
	}]
}
POST _analyze
{
	"analyzer": "ik_smart",
	"text": "南京市长江大桥"
}

Run the ik_smart word segmentation mode to get the result:

{
	"tokens": [{
		"token": "南京市",
		"start_offset": 0,
		"end_offset": 3,
		"type": "CN_WORD",
		"position": 0
	}, {
		"token": "长江大桥",
		"start_offset": 3,
		"end_offset": 7,
		"type": "CN_WORD",
		"position": 1
	}]
}

Now if Jiang Daqiao is the name of a person, the mayor of Nanjing, then the participle above is obviously unreasonable, what should we do?

 

2.3 Dictionary usage

Extended words: It means that you don't want words to be separated, let them be divided into one word. For example, the river bridge above

Stop words: Some words appear very frequently in the text. But it does not have much impact on the semantics of this article. For example, a, an, the, of, etc. in English. Or Chinese "的, 了, 呀, etc". Such words are called stop words. Stop words are often filtered out and not indexed. During the retrieval process, if the user's query contains stop words, the system will automatically filter them out. Stop words can speed up indexing and reduce the size of index library files.

Extended words and stop words are centrally stored on the linux123 server and managed centrally by the web server, avoiding each node maintaining its own dictionary

Linux123 Deploy Tomcat

The following operations use the es user

1 Upload the tomcat installation package to the linux123 server

In order to avoid permission issues, upload to this directory: /opt/lagou/servers/es/
 

cd /opt/lagou/servers/es/

2 decompression

tar -zxvf apache-tomcat-8.5.59.tar.gz
 
mv apache-tomcat-8.5.59/ tomcat/

3 Configure a custom dictionary file

  • Custom Extended Thesaurus
cd /opt/lagou/servers/es/tomcat/webapps/ROOT
 
vim ext_dict.dic
  • Added: River Bridge

custom stop words

cd /opt/lagou/servers/es/tomcat/webapps/ROOT
 
vim stop_dict.dic

Add to:

的
了
啊

4 start tomcat

/opt/lagou/servers/es/tomcat/bin/startup.sh

browser access

http://linux123:8080/ext_dict.dic

 

5 Configure the IK tokenizer

Add custom extensions, disable dictionaries

Use the root user to modify, or directly change the entire folder to the es user! !

#三个节点都需修改
cd /opt/lagou/servers/es/elasticsearch/plugins/analysis-ik/config
 
vim IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
 
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
 
<properties>
 
    <comment>IK Analyzer 扩展配置</comment>
 
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict"></entry>
 
    <!--用户可以在这里配置自己的扩展停止词字典-->
    <entry key="ext_stopwords"></entry>
 
    <!--用户可以在这里配置远程扩展字典 -->
    <entry key="remote_ext_dict">http://linux123:8080/ext_dict.dic</entry>
 
    <!--用户可以在这里配置远程扩展停止词字典-->
    <entry key="remote_ext_stopwords">http://linux123:8080/stop_dict.dic</entry>
 
</properties>

6 Restart the service

#杀死es
ps -ef|grep elasticsearch|grep bootstrap |awk '{print $2}' |xargs kill -9
 
#启动
nohup /opt/lagou/servers/es/elasticsearch/bin/elasticsearch >/dev/null 2>&1 &
 
#重启kibana
cd /opt/lagou/servers/kibana/
/bin/kibana

Section 3 Mapping Operations

After the index is created, it is equivalent to having a database in a relational database. Elasticsearch7.x cancels the setting of the index type type, and does not allow specifying the type. The default is _doc, but the field still exists. We need to set the constraint information of the field, which is called field mapping (mapping)

Field constraints include but are not limited to:

  • the data type of the field
  • Do you want to store
  • Do you want to index
  • tokenizer

Take a look at the syntax created.

1. Create a mapping field

grammar

PUT /索引库名/_mapping
{
	"properties": {
		"字段名": {
			"type": "数据类型",
			"index": true, //是否索引,不索引就无法针对这个字段查询
			"store": false, //存储,默认不存储,_source:存储了文档的所有字段内容;从_source字段中可以获取所有字段,但是需要自己解析,如果对某个字段指定了存储,在查询时直接指定返回的字段会增加io开销。
			"analyzer": "分词器"
		}
	}
}

https://www.elastic.co/guide/en/elasticsearch/reference/7.3/mapping-params.html

Field name: Fill in freely, specify many attributes below, for example:

  • type: type, which can be text, long, short, date, integer, object, etc.
  • index: whether to index, the default is true
  • store: whether to store, the default is false
  • analyzer: specify the tokenizer

example

 PUT /lagou-company-index

 Initiate a request:

PUT /lagou-company-index/_mapping/
{
	"properties": {
		"name": {
			"type": "text",
			"analyzer": "ik_max_word"
		},
		"job": {
			"type": "text",
			"analyzer": "ik_max_word"
		},
		"logo": {
			"type": "keyword",
			"index": "false"
		},
		"payment": {
			"type": "float"
		}
	}
}

Response result:

 

In the above case, 4 fields are set for the index library of lagou-company-index:

  • name: company name
  • job: demand position
  • logo: URL of the logo image
  • payment: salary

And set some attributes for these fields. As for the meaning of these attributes, we will introduce them in detail later.

2. Detailed Mapping Attributes

1 type

The data types supported in Elasticsearch are very rich:

https://www.elastic.co/guide/en/elasticsearch/reference/7.3/mapping-types.html

A few key ones:

The String type is divided into two types:

  • text: separable words, cannot participate in aggregation
  • keyword: inseparable, the data will be matched as a complete field, and can participate in aggregation

Numerical: Numerical type, divided into two categories

  • Basic data types: long, integer, short, byte, double, float, half_float
  • The high-precision type of floating-point numbers: scaled_float, you need to specify a precision factor, such as 10 or 100. Elasticsearch will multiply the real value by this factor and store it, and restore it when it is taken out.

Date: date type

Elasticsearch can format the date as a string and store it, but it is recommended that we store it as a millisecond value and store it as a long to save space.

Array: array type

  • When matching, if any element is satisfied, it is considered to be satisfied
  • When sorting, sort by the minimum value in the array if ascending, and sort by the maximum value in the array if descending

Object: object

{
	"name": "Jack",
	"age": 21,
	"girl": {
		"name": "Rose",
		"age": 21
	}
}

If the object type is stored in the index library, such as girl above, it will turn girl into two fields: girl.name and girl.age

2 index

index affects the indexing of the field.

  • true: The field will be indexed and can be used for searching. The default value is true
  • false: the field will not be indexed and cannot be used for searching

The default value of index is true, which means that all fields will be indexed without any configuration. But there are some fields that we don't want to be indexed, such as the logo picture address of the company, we need to manually set the index to false.

3 store

Whether to store data independently. The original text will be stored in _source. By default, other extracted fields are not stored independently, but are extracted from _source. Of course, you can also store a certain field independently, as long as you set store: true, it is much faster to obtain independently stored fields than parsing from _source, but it will also take up more space, so it should be based on actual business needs to set, the default is false.

4 analyzer: specify the word breaker

Generally, when we deal with Chinese, we will choose the ik tokenizer ik_max_word ik_smart

3. View the mapping relationship

View a single index mapping relationship

grammar:

 GET /索引名称/_mapping

 

View all index mappings

method one

GET _mapping

way two

GET _all/_mapping

 

Modify the index mapping relationship

grammar

PUT /索引库名/_mapping
{
	"properties": {
		"字段名": {
			"type": "类型",
			"index": true,
			"store": true,
			"analyzer": "分词器"
		}
	}
}

Note: Modifying the mapping can only be an operation of adding fields, and making other changes can only delete the index and re-establish the mapping.

 

4. Create index and mapping at one time

In the previous case, we separated the creation of the index library from the mapping. In fact, we can also directly formulate the indexes in the index library while creating the index library.

Basic syntax:

put /索引库名称
{
	"settings": {
		"索引库属性名": "索引库属性值"
	},
	"mappings": {
		"properties": {
			"字段名": {
				"映射属性名": "映射属性值"
			}
		}
	}
}

the case

PUT /lagou-employee-index
{
	"settings": {},
	"mappings": {
		"properties": {
			"name": {
				"type": "text",
				"analyzer": "ik_max_word"
			}
		}
	}
}

Section 4 Document Addition, Deletion, Modification and Partial Update

Documents, that is, the data in the index library, will be indexed according to the rules and will be used for searching in the future. It can be compared to a row of data in a database.

1. Add new document

When adding a document, it involves the creation method of the id, which can be manually specified or automatically generated.

Add new document (manually specify id)

grammar

POST /索引名称/_doc/{id}

 example

POST /lagou-company-index/_doc/1
{
	"name": "百度",
	"job": "小度用户运营经理",
	"payment": "30000",
	"logo": "http://www.lgstatic.com/thubnail_120x120/i/image/M00/21/3E/CgpFT1kVdzeAJNbUAABJB7x9sm8374.png"
}
 
POST /lagou-company-index/_doc/2
{
	"name": "百度",
	"job": "小度用户运营经理",
	"payment": "30000",
	"logo": "http://www.lgstatic.com/thubnail_120x120/i/image/M00/21/3E/CgpFT1kVdzeAJNbUAABJB7x9sm83  74.png",
	"address": "北京市昌平区"
}
 
POST /lagou-company-index/_doc/3
{
	"name1": "百度",
	"job1": "小度用户运营经理",
	"payment1": "30000",
	"logo1": "http://www.lgstatic.com/thubnail_120x120/i/image/M00/21/3E/CgpFT1kVdzeAJNbUAABJB7x9sm83  74.png",
	"address1": "北京市昌平区"
}

 

Add new document (automatically generate id)

grammar

POST /索引名称/_doc
{
	"field": "value"
}

example

POST /lagou-company-index/_doc/
{
    "name": "百度",
    "job": "小度用户运营经理",
    "payment": "30000",
    "logo": "http://www.lgstatic.com/thubnail_120x120/i/image/M00/21/3E/CgpFT1kVdzeAJNbUAABJB7x9sm83  74.png"
}

 You can see that the result is displayed as: created , which means the creation is successful. In addition, it should be noted that there is an _id field in the response result, which is the unique identifier of this document data. This _id is used as the unique identifier. Here is the id randomly generated by Elasticsearch for us.

2. View a single document

grammar

GET /索引名称/_doc/{id}

example

GET /lagou-company-index/_doc/1

 

Interpretation of document metadata

 

3. View all documents

grammar

POST /索引名称/_search
{
	"query": {
		"match_all": {}
	}
}

example

POST /lagou-company-index/_search
{
	"query": {
		"match_all": {}
	}
}

4. _source customized return result

In some business scenarios, we don't need the search engine to return all the fields in the source, which can be customized using source, as follows, multiple fields are separated by commas

GET /lagou-company-index/_doc/1?_source=name,job

 

5. Update documentation (update all)

Changing the newly added request method to PUT is a modification, but the modification must specify the id

  • If the document corresponding to the id exists, modify it
  • If the document corresponding to the id does not exist, add it

For example, if we use the id as 4, if it does not exist, it should be added

example

PUT /lagou-company-index/_doc/5
{
	"name": "百度",
	"job": "大数据工程师",
	"payment": "300000",
	"logo": "http://www.lgstatic.com/thubnail_120x120/i/image/M00/21/3E/CgpFT1kVdzeAJNbUAABJB7x9sm83  74.png"
}
{
	"_index": "lagou-company-index",
	"_type": "_doc",
	"_id": "3",
	"_version": 1,
	"result": "created",
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 2,
	"_primary_term": 1
}

You can see that it is created, which is a new addition.

We execute the previous request again, but change the data

 You can see that the result is: updated , obviously updated data

 

6. Update documentation (partial update)

Elasticsearch can use PUT or POST to update the document (all updates). If the document with the specified ID already exists, the update operation will be performed.

Notice:

When Elasticsearch performs an update operation, Elasticsearch first marks the old document as deleted, and then adds a new document . The old document will not disappear immediately, but you will not be able to access it. Elasticsearch will continue to add more data. Documents that have been marked for deletion are cleaned up in the background.

Full update is to directly mark the old data as deleted, and then add an updated (using PUT or POST) partial update,

Just modify a certain field (using POST)

grammar

POST /索引名/_update/{id}
{
	"doc": {
		"field": "value"
	}
}

example


POST /lagou-company-index/_update/3
{
	"doc": {
		"name": "淘宝"
	}

 

7. Delete documents

Delete according to id:

grammar

DELETE /索引名/_doc/{id}

example

DELETE /lagou-company-index/_doc/3

 You can see that the result is: deleted , obviously deleting data

 

Delete according to query conditions

grammar

POST /索引库名/_delete_by_query
{
	"query": {
		"match": {
			"字段名": "搜索关键字"
		}
	}
}

Example:

#查询name字段百度关键字的doc
POST / lagou - company - index / _search
{
    "query": {
        "match": {
            "name": "百度"
        }
    }
}
 
#删除name字段百度关键字的doc
POST / lagou - company - index / _delete_by_query
{
    "query": {
        "match": {
            "name": "百度"
        }
    }
}


result
 


{
	"took": 14,
	"timed_out": false,
	"total": 1,
	"deleted": 1,
	"batches": 1,
	"version_conflicts": 0,
	"noops": 0,
	"retries": {
		"bulk": 0,
		"search": 0
	},
	"throttled_millis": 0,
	"requests_per_second": -1.0,
	"throttled_until_millis": 0,
	"failures": []
}

delete all documents

POST /索引名/_delete_by_query
{
	"query": {
		"match_all": {}
	}
}

Guess you like

Origin blog.csdn.net/gaoshan12345678910/article/details/129022254