Elasticsearch finishing notes (5)

Note for versions above es6:

1. Elasticsearch-head will report a 406 error when connecting to versions above 6.x:

Content-Type header [application/x-www-form-urlencoded] is not supported

reason:

The 6.x version of ES strictly controls the Content-Type problem. application/x-www-form-urlencoded does not support the content body in JSON format, so it needs to be modified

Solution:

Enter the head plugin installation directory -> elasticsearch-head folder -> vendor.js file

Modify two places:

1. 6886行 contentType: "application/x-www-form-urlencoded"
    改成 contentType: "application/json;charset=UTF-8" 
2. 7574行 var inspectData = s.contentType === "application/x-www-form-urlencoded" && 
    改成 var inspectData = s.contentType === "application/json;charset=UTF-8" &&

 

2.type

Since the first release of Elasticsearch, each document has been stored in a separate index and given a type, a mapping type representing the type of document or entity being indexed, for example, a twitter index There may be a user type and a tweet type.

Each mapping type has its own fields, so a user type might have a full_name field, a user_name field, and an email field, while a tweet type might have a content field, a tweet_at field, and a user_name field like the user type.

Each document type has a _type meta field to store the type name, and the query (search) is limited to one or more types (type) according to the type name specified in the URL

GET twitter/user,tweet/_search
{
  "query": {
    "match": {
      "user_name": "kimchy"
    }
  }
}

The _type field is used to combine with the _id field of the document to generate the _uid field, so documents of different types with the same _id can exist in the same index. Types are also used to establish parent-child relationships between documents, so a question type document may be the parent document of anser type document.

At first, we said that "index" and "library" of a relational database are similar, and "type" and "table" are equivalent.
This is an incorrect contrast, leading to incorrect assumptions. In a relational database, "tables" are independent of each other, and the columns in one "table" have no relationship with the columns of the same name in another "table", and do not affect each other. But this is not the case for fields in types.

In an Elasticsearch index, all different types of fields with the same name use the same lucene field storage internally. That is to say, in the above example, the user_name field of the user type and the user_name field of the tweet type are stored in one field, and the user_name in the two types must have the same field definition.

This can cause problems, for example, if you want the "deleted" field in the same index to store a date value in one type and a boolean value in another type.

Finally, in the same index, storing documents with only a small number of fields that are the same or all of the fields are different will result in sparse data and affect Lucene's ability to effectively compress data.

So 6.X enforces that an index can only have one type, while 7.X directly removes the concept of type.

But there is a problem . In the ES5.X version, parent-child documents are used to realize multi-table association, similar to the function of Join in the database; the core of the implementation is to support multiple types under one index (index) with the help of ES5.X. In ES6.X version, only a single type is supported under each index. How to realize the association of multiple tables similar to mysql? ——Elasticsearch 6.X new type Join

For details, please refer to the official document: https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html

3. Special character & keyword query

In the es5 version, you can directly use term, wildcard, etc. to query the fields without word segmentation, but you will find that the query is invalid after es6. Special characters such as "+" cannot participate in search conditions, and escapes are also invalid.

{
	"query": {
		"bool": {
			"filter": [{
				"term": {
					"user_id":"user-x"
				}
			}]
		}
	}
}

The above query cannot find the value of "user-x" in the "user_id" field.

{
	"query": {
		"bool": {
			"filter": [{
				"term": {
					"user_id":"user"
				}
			}]
		}
	}
}

Instead, the above query can find out user-x.

reason:

After version 5.0 of es, the string type is divided into two types: text and keyword. The former will be segmented first and then indexed, while the latter will not, and the complete string will directly become the index.

Official description:

Keyword datatype
A field to index structured content such as email addresses, hostnames, status codes, zip codes or tags.
They are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value.
If you need to index full text content such as email bodies or product descriptions, it is likely that you should rather use a text field.

I don't know why, but this is not the case for keywords after version 6.x. The original text data will still be processed by word segmentation, but an additional keyword attribute will be created to retain complete characters.

When we search for original data without word segmentation, we need to add .keyword after the original key, and compare key.keyword to achieve precise matching.

{
	"query": {
		"bool": {
			"filter": [{
				"term": {
					"user_id.keyword":"user-x"
				}
			}]
		}
	}
}

This design is simply anti-human, and I didn't expect any advantages of such a design. And there will be a lot of useless word segmentation data wasting resources, so, is it necessary to use es if word segmentation is not required? This is a soul-deep question.

Guess you like

Origin blog.csdn.net/sm9sun/article/details/109070081