Elasticsearch Null Value Handling Practical Guide

1 Introduction

In actual business scenarios, you often encounter situations where you define null values ​​and retrieve data with specified null values.

At this time, when we look at the null_value section of the official document, we will see the following description:

Accepts a string value which is substituted for any explicit null values. Defaults to null, which means the field is treated as missing.

Accept a string value to replace all explicit null values. The default is null, which means the field is considered missing.

A null value cannot be indexed or searched. When a field is set to null, (or an empty array or an array of null values) it is treated as though that field has no values.

Null values ​​cannot be indexed or searched. When a field is set to null (or an empty array or an array of null values), it is treated as if the field has no value.

Just look at the literal meaning, don't you feel hard to understand?

Okay, squat to death and find out:

DELETE my-index-000001
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      }
    }
  }
}

PUT my-index-000001/_bulk
{"index":{"_id":1}}
{"status_code":null,"title":"just test"}
{"index":{"_id":2}}
{"status_code":"","title":"just test"}
{"index":{"_id":3}}
{"status_code":[],"title":"just test"}

POST my-index-000001/_search

POST my-index-000001/_search
{
  "query": {
    "term": {
      "status_code": null
    }
  }
}

The above search returns the error as follows:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "field name is null or empty"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "field name is null or empty"
  },
  "status": 400
}

2. The meaning of null_value

The null_value parameter allows you to replace explicit null values with the specified value so that it can be indexed and searched. 

Use the null_value parameter to replace the explicit null value with the specified value so that it can be indexed and searched. E.g:

DELETE my-index-000001
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type":       "keyword",
        "null_value": "NULL"
      }
    }
  }
}

PUT my-index-000001/_bulk
{"index":{"_id":1}}
{"status_code":null}
{"index":{"_id":2}}
{"status_code":[]}
{"index":{"_id":3}}
{"status_code":"NULL"}

GET my-index-000001/_search
{
  "query": {
    "term": {
      "status_code": "NULL"
    }
  }
}

Note that the result is returned here: documents with _id = 1 and _id = 3, but documents with _id = 2 are not included.

Explain:

"null_value": The meaning of "NULL": replace the explicit null value with the specified value. "NULL" can be customized. For example, in the business system, we can define it as "Unkown".

The vernacular explanations that everyone can understand are as follows:

  • It is equivalent to specifying an empty default value in the Mapping definition phase, and replacing it with "NULL". The advantage of this: similar to the above document with _id = 1, empty fields can also be indexed and retrieved.

  • The "field name is null or empty" error will no longer be reported.

3. Note on using null_value

  • The null_value must match the defined data type. For example, a long type field cannot have a string type null value.

The following definition will report an error:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword"
      },
      "title": {
        "type": "long",
        "null_value": "NULL"
      }
    }
  }
}

The error is as follows:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping [_doc]: For input string: \"NULL\""
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [_doc]: For input string: \"NULL\"",
    "caused_by": {
      "type": "number_format_exception",
      "reason": "For input string: \"NULL\""
    }
  },
  "status": 400
}

Explain: Obviously caused by type mismatch.

  • The null_value only affects the index of the data and does not modify the _source document.

4. Which fields have null_value and which fields do not have null_value?

The following core common fields are supported: null_value.

  • Arrays

  • Boolean

  • Date

  • geo_point

  • IP

  • Keyword

  • Numeric

  • point

Don't ask me how I know it, it's confirmed by the official documents checked one by one.

The most frequently asked questions:

4.1 Question 1: Does the text type not support null_value?

Yes, it is not supported.

Come on, take a real fight:

DELETE my-index-000001
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword"
      },
      "title": {
        "type": "text",
        "null_value": "NULL"
      }
    }
  }
}

The returned results are as follows:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Mapping definition for [title] has unsupported parameters:  [null_value : NULL]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [_doc]: Mapping definition for [title] has unsupported parameters:  [null_value : NULL]",
    "caused_by": {
      "type": "mapper_parsing_exception",
      "reason": "Mapping definition for [title] has unsupported parameters:  [null_value : NULL]"
    }
  },
  "status": 400
}

Question 2: If the text type also wants to set a null value, what should I do?

Recommend multi-fields to meet business needs with the help of a combination of keyword and text.

The definition reference is as follows:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword"
      },
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "null_value": "NULL"
          }
        }
      }
    }
  }
}

For text type fields, in actual business scenarios, we often need to set at the same time: multi_fields, set the combination of text and keyword.

The text type is used for full-text search, and the keyword is used for aggregation and sorting.

At the same time, multi_fields is one of the core test sites for Elastic certified engineers, and everyone must master it.

5. Discussion of online issues

Dear friends, please ask a question. I now have a content field in the data. I want to check if this field is not an empty string. I can’t use must_not. I posted my sql

The Elasticsearch Technology Exchange Group

My interpretation is as follows:

Tell me about the correct way of writing this question and the reasons why it was incorrectly written before.

The essence of judging whether it is empty is: the exact matching problem is not the category of full-text search (similarity matching), so the selection and use: match_phrase will cause the following error. Should use: term.

POST test_001/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "cont"
              }
            },
            {
              "term": {
                "content.keyword": {
                  "value": ""
                }
              }
            }
          ]
        }
      }
    }
  }
}

Note: The meaning of exists search is to determine whether the field exists, and the combined use has a better effect and is more secure!

The following script can also be implemented, but due to performance issues, it is not recommended to use it at the actual business level.

POST test_001/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source": "doc['content.keyword'].length == 1",
            "lang": "painless"
          }
        }
      }
    }
  }
}

Imagine that if you define the combined fields of text and keyword in the data modeling phase of defining Mapping, and set null_value for the keyword, this problem will be better solved.

6. Summary

As Luo Pang said: No matter how obvious the truth is, at least 100 million people in China don't know it.

However, I think Elasticsearch technology is also an obvious technical point. In China's Elastic technology circle, at least N many people don't know it.

How to do? Knock down and find out!

How does your business scenario deal with null values? Welcome to leave a message to discuss.

7. Snacks-Discussion

Some readers have private messages:

In fact, the boss can also learn from other accounts and repost some traffic articles from big V or big companies, with pictures and texts. Although everyone may not understand it, it looks very awesome. You can also add fans to your official account. , How great! You have only one article a week, and everyone is gone.

my reply:

Each official account has its own mission and value of existence. You can compare it carefully. Each account owner has its own characteristics. Why should they converge? I feel that this is also very good (that is, bitter, poorer), and from a long-term perspective (ten years or more), the value of persistence can be reflected.

Add WeChat: elastic6 (only a few pits are left), and work hard with BAT bosses to improve Elastic technology!


Recommended reading:

Heavy | Awareness List of Elasticsearch Methodology (Updated National Day 2020)

You can pass the Elastic certification exam with a driver's license!


More short time more quickly learned more and more dry!

Nearly  50 %+  Elastic certified engineers in China come from here!

Fight  with Elasticsearch with 800+ Elastic enthusiasts around the world  !

Guess you like

Origin blog.csdn.net/wojiushiwo987/article/details/109712672