Elasticsearch: Why was the mapping type type removed from Elasticsearch 7.0.0 and later?

Indexes created in Elasticsearch 7.0.0 or later no longer accept a _default_ mapping. Indexes created in 6.x will continue to function as before in Elasticsearch 6.x. The type type has been deprecated in the API in 7.0 with breaking changes to the Index Creation, PutMap, GetMap, PutTemplate, GetTemplate, and GetFieldMap APIs.

What are mapped types?

We know that Elasticsearch is a document database, and the mapping type type indicates the type of document or entity being indexed, for example a youtube index may have a user type and a video type. You can roughly understand type as a table in a relational database.

Each mapping type can have its own fields, so a user user type might have a full_name field, a user_name field, and an email field, while a video video type could have a video_url field, an uploaded_at field, and like a user user type The user_name field.

Every document has a _type metadata field containing the type name, and the search can be restricted to one or more types by specifying the type name in the URL:

GET youtube/user,video/_search
{
  "query": {
    "match": {
      "user_name": "kimchy"
    }
  }
}

A document's _type and _id fields combine to generate a _uid field that helps uniquely identify records and documents with the same _id stored in the same index.

The code snippet below shows how different types of documents were previously stored in the same index ( note that the code below only applies to versions prior to 7.0.0 ):

PUT youtube
{
  "mappings": {
    "user": {
      "properties": {
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" }
      }
    },
    "video": {
      "properties": {
        "video_url": { "type": "text" },
        "user_name": { "type": "keyword" },
        "uploaded_at": { "type": "date" }
      }
    }
  }
}

The code above creates a map for the youtube index, which has two types: user and video.

PUT youtube/user/debraj
{
  "name": "Debraj Bhal",
  "user_name": "debraj",
  "email": "[email protected]"
}

PUT youtube/video/1
{
  "user_name": "debraj",
  "uploaded_at": "2017-10-24T09:00:00Z",
  "video_url": "https://myvideo.com"
}

The above code snippet is used to create/update documents for user of type _id debraj and video of type _id 1 respectively.

These documents of a specific type can be retrieved by using the type name in the request URL, as shown in the following code snippet:

GET youtube/video/_search
{
  "query": {
    "match": {
      "user_name": "debraj"
    }
  }
}

Why were mapped types removed, even though they provided such great functionality?

In Elasticsearch, indexes are similar to SQL databases, and types are similar to tables. But the analogy isn't quite right. Because the tables in the SQL database are independent of each other, that is, fields with the same name in different tables are completely independent and independent of each other.

But in an Elasticsearch index, fields with the same name in different mapping types are internally backed by the same Lucene field. In other words, using the example above, the user_name field in the User type is stored in the exact same field as the user_name field in the Video type, and both user_name fields must have the same mapping in both types (by definition, ie same data type).

This can cause failure, for example, when you want to define a deleted field that is defined as a date field in one type and as a bool field in another type in the same index.

Additionally, types with few or no common fields, if stored in the same index, can result in data sparseness and interfere with Lucene's ability to compress documents efficiently. For example, in the example above, only the username field is common, so for video type, email and name type fields are not useful, thus resulting in sparse data.

Alternatives to Mapping Types

Although mapping types have some disadvantages, they help organize linked data. So, even after the mapping type is removed in Elasticsearch, we can still achieve similar functionality in two ways:

Indexes for each document type:

The first option is to have one index per document type. Instead of storing videos and users in a single youtube index, you could store videos in a video index and users in a user index.

Custom type field:

Of course, there is a limit to the number of primary shards that can exist in a cluster, so you probably don't want to waste an entire shard for a collection containing only a few thousand documents. In this case, you can implement your own custom type field, which works similarly to the old _type.

PUT youtube
{
  "mappings": {
    "_doc": {
      "properties": {
        "type": { "type": "keyword" }, 
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" },
        "video_url": { "type": "text" },
        "uploaded_at": { "type": "date" }
      }
    }
  }
}

In the document's mapping definition, we add an additional field type. This field is used to differentiate between different types of documents stored in the same index. We can update and query these documents as shown in the code snippet below.

PUT youtube/_doc/user-debraj
{
  "type": "user", 
  "name": "Debraj Bhal",
  "user_name": "debraj",
  "email": "[email protected]"
}

PUT youtube/_doc/video-1
{
  "type": "video", 
  "user_name": "debraj",
  "uploaded_at": "2017-10-24T09:00:00Z",
  "video_url": "https://myvideo.com"
}

GET youtube/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "user_name": "debraj"
        }
      },
      "filter": {
        "match": {
          "type": "video" 
        }
      }
    }
  }
}

Hope this article helped you learn more about mapping types in Elasticsearch.

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/132553611