Elasticsearch: Search and Index Analyzer

In my previous article " Elasticsearch: analyzer ", I covered analyzers in Elasticsearch in detail. The analyzer is in Elasticsearh, it needs to be used when indexing documents, and at the same time, it also needs to perform word segmentation for the searched text when searching. In today's article, let's take a closer look at how analyzers are used for indexing and searching.

Analyzers can be specified at several levels: index, field, and query level. Declaring an analyzer at the index level provides an index-wide default catch-all analyzer for all text fields. However, different analyzers can also be enabled at the field level if further customization at the field level is required. In addition to this, we can provide a different profiler than the index time profiler at search time. Let us review these options one by one in this section.

index analyzer

Sometimes we may need to have different analyzers for different fields -- for example, a name field may be associated with a simple analyzer, while a credit card number field may be associated with a pattern analyzer. Fortunately, Elasticsearch allows us to set different analyzers on individual fields as needed; similarly, we can also set a default analyzer for each index, so that any fields not explicitly associated with a specific analyzer during the mapping process will inherit the index-level analyzer. Let us examine these two mechanisms in this section.

field-level analyzer

We can specify the required analyzers at the field level when creating the mapping definition for the index. For example, the code below shows how we can take advantage of it during index creation:

PUT books_with_field_level_analyzers
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text" #A Standard analyzer is being used here
      },
      "about":{
        "type": "text",
        "analyzer": "english" #B Set explicitly with an english analyzer
      },
      "description":{
        "type": "text",
        "fields": {
          "my":{
            "type": "text",
            "analyzer": "fingerprint" #C Fingerprint analyzer on a multi-field
          }
        }
      }
    }
  }
}

As the code shows, the about and description fields are specified with different analyzers, except for the name field which implicitly inherits from the standard analyzer.

Index Level Analyzer

We can also set the default analyzer of our choice at the index level, as the following code listing demonstrates:

Create an index with the default analyzer

PUT books_with_default_analyzer
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default":{ #A Setting this property sets index’s default analyzer
          "type":"keyword"
        }
      }
    }
  }
}

In this code listing, we actually replace the default standard analyzer with the keyword analyzer . You can test the analyzer by calling the _analyse endpoint on the index, as shown in the code listing given below:

POST books_with_default_analyzer/_analyze
{
  "text":"Elasticsearch books" 
}

The above command returns:

{
  "tokens": [
    {
      "token": "Elasticsearch books",
      "start_offset": 0,
      "end_offset": 19,
      "type": "word",
      "position": 0
    }
  ]
}

We can write a document using the following command:

PUT books_with_default_analyzer/_doc/1
{
  "name": "Elasticsearch books"
}

If the default analyzer is not used, the name must be text type data, and it will be displayed in the following format:

      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }

Note : In case no default analyzer is defined, the default analyzer is the standard analyzer.

When we created the books_with_default_analyzer index, we specified the default analyzer as keyword. That is to say, if no analyzer is specified, it will automatically use keyword as the analyzer. If we use the following command to search:

GET books_with_default_analyzer/_search
{
  "query": {
    "match": {
      "title": "Elasticsearch"
    }
  }
}

We got 0 results. This is because the title field is a keyword field, and the content Elasticsearch we searched for is only a part of "Elasticsearch books".

If we want to get the mappings of books_with_default_analyzer, we can get it by the following command:

{
  "books_with_default_analyzer": {
    "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

From the above results, we can see that tillte is a multi-field field. Since the default analyzer we use is keyword, basically for the field whose type is text, it also uses the keyword analyzer to analyze the field. Then the final result is that both the title and title.keyword fields are keyword-type fields, that is, they are not word-segmented.

search analyzer

Elasticsearch allows us to specify a different analyzer during query instead of using the same analyzer during indexing. It also allows us to set a default analyzer in the index - this can be set during index creation. Let's take a look at these two approaches below, and some rules that Elasticsearch follows when choosing analyzers defined at different levels.

Tokenizers in queries

We haven't finished the search part yet, so don't worry if the code below confuses you a bit:

GET books_index_for_search_analyzer/_search
{
  "query": {
    "match": { #A
      "author_name": {
        "query": "M Konda",
        "analyzer": "simple" #B
      }
    }
  }
}

As shown in the code above, we explicitly specify the analyzer when searching for authors (it is likely that the author_name field will be indexed with a different type of analyzer!).

Setting analyzers at the field level

The second mechanism for setting search specific analyzers is at the field level. Just as we set analyzers on fields for indexing purposes, we can specify search analyzers by adding an additional property called search_analzyer on fields. The following code demonstrates this approach:

PUT books_index_with_both_analyzers_field_level
{
  "mappings": {
    "properties": {
      "author_name":{
        "type": "text",
        "analyzer": "stop",
        "search_analyzer": "simple"
      }
    }
  }
}

As shown in the code above, author_name sets a stop analyzer for indexing and a simple analyzer for search time .

The default analyzer at the index level

We can also set a default analyzer for search queries, like we did for index time, by setting the desired analyzer on the index at index creation time. The following code listing demonstrates the setup:

PUT books_index_with_default_analyzer
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default_search":{ #A
          "type":"simple"
        },
        "default":{ #B
          "type":"standard"
        }
      }
    }
  }
}

In the above code listing, in addition to the search, we also set the default analyzer for the index. You might be wondering if we can set the search analyzer at field level during indexing instead of at runtime during query? The following code demonstrates exactly this - setting different analyzers for indexing and searching at the field level during index creation:

PUT books_index_with_both_analyzers_field_level
{
  "mappings": {
    "properties": {
      "author_name":{
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "simple"
      }
    }
  }
}

As you can see from the code above, author_name will be indexed using the standard analyzer, while the simple analyzer will be used during the search.

order of priority

There is an order in which the engine picks up analyzers as they are found at different levels. Here is the order of precedence for the engine to choose the correct analyzer:

  • Analyzers defined at the query level have the highest priority.
  • An analyzer defined by setting the search_analyzer attribute on a field when defining an index map.
  • Analyzers defined at the index level.
  • If none of the above is set, the Elasticsearch engine chooses the index analyzer set on the field or index.

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/130334582