table of Contents
1. Overview
Let's take a look at the introduction of some commonly used parameters to get a general understanding of the meaning of these parameters, and then we will give a more detailed explanation of some important parameters.
1.1 Parameters affecting performance
parameter | Description |
---|---|
index | Default true, whether the field is indexed |
enable | The default is true, whether the field is indexed, not indexing can reduce CPU usage, but it cannot be searched |
store | The default is false. If the fields to be obtained are only small data in the document, these fields can be stored to reduce IO |
doc_values | Default true, optimize field sorting and aggregation script access, consume disk space |
fielddata | Default false, optimized for text type sorting, aggregation, and script access, try to avoid it, the operation is expensive |
norms | Default true, if the field does not need to participate in scoring, set to false to reduce disk usage |
The norms of many term operations can be set to false. For example, many date fields are rarely used for full-text indexing on dates. Of course, specific problems are analyzed in detail.
Index structured fields, such as email address, host name, status code, and label, can be set to false for the field norms that can be set to keyword type.
1.2 Other
parameter | Description |
---|---|
boost | Default 1, score weighting factor |
analyzer | Tokenizer used by the field |
similarity | Algorithm used for scoring |
fields | Multiple processing methods for one field |
null_value | Set the default value of null |
search_analyzer | Search tokenizer |
ignore_above | Set the length of the index and store, if the length exceeds the length, ignore |
copy_to | Copy the field to the specified field to facilitate searching through a field |
ignore_malformed | Default false, ignore the abnormal data of this field when adding documents |
index_option | The content of the inverted index, docs, freqs, positions, offsets |
coerce | Default true, whether to allow data type coercion, such as string to number, floating point to integer |
dynamic | The default is true, whether it is allowed to dynamically add mapping types according to the document, true\false\strict |
二、ignore_malformed
A very useful parameter, as the name suggests, ignore the abnormal field type.
For example, when adding a document, a date type value is set to an email string, or other types that cannot be converted to date, ES will directly throw an exception.
If ignore_malformed is set to true, the error field will be ignored and other fields will be processed normally.
Three, enable
Simply speaking, whether the field controlled by enable can be indexed, the default is true, because our purpose of using ES is to be able to index the full text.
What if we want to find the data for some fields, but we don't want the index to affect the score of the document?
Just set enable to false.
Note: enable can only be set at the top level and type is object to take effect.
If enable is set to true, although it cannot be indexed, data can be obtained through _source.
Four, store
store controls the data storage method, set to true will store a separate copy. If the query is to retrieve several small fields from a lot of documents, you can set the small fields to store to store them separately.
This method is more efficient than retrieving from _source and then excluding it because it reduces IO operations.
Inquire:
{
"_source": false,
"stored_fields": [ "title", "date" ]
}
By stored_fields method, _source will not be obtained by default
五、dynamic
The dynamic setting is whether ES can dynamically detect the document field, and dynamically modify and add the corresponding type in the mapping.
Dynamic has 3 values:
- true means allowing dynamic detection of document fields to add and modify the mapping type
- false means that the mapping will not be modified dynamically, but if there are fields in the document that are not in the mapping, it can be added successfully, but the field will not be indexed
- strict means that if there are fields in the document that are not in the mapping, the document is not allowed to be added
Six, doc_values
The default is true, if you do not need to sort and aggregate the fields, or access the fields in the script, you can set it to false to save disk space
text type does not support doc_values
七、fielddata
Because doc_values does not support the text type, so with fielddata, fielddata is a text version of doc_values, which is also to optimize field sorting, aggregation and script access.
Unlike doc_values, fielddata uses memory instead of disk.
Loading fielddata is an expensive process, so the default is false.
It is strongly recommended not to use fielddata, you should avoid using fielddata when designing
Eight, index_option
Which information to store the inverted index, 4 optional parameters:
- docs: index document number
- freqs: document number + word frequency
- positions: document number + word frequency + position, usually used for distance query
- offsets: document number + word frequency + position + offset, usually used in the highlighted field
The word segmentation field defaults to positions, other defaults are docs
Nine, format
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
十、normalizer
Set the field normalization processor.
Eleven, null_value
Null values cannot be searched, you can set a default string for null values.
{
"mappings": {
"properties": {
"status_code": {
"type": "keyword",
"null_value": "NULL"
}
}
}
}
Twelve, search_analyzer
Thirteen, fields
If you want to perform multiple operations on a field, such as a field of type text, to do a full-text index, sorting and aggregation operations are needed.
{
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
In this way, query operations can be performed on city, and sort and aggs operations can be performed on city.raw.