Elasticsearch with canal to achieve index addition field and array field synchronization and search

Get into the habit of writing together! This is the 12th day of my participation in the "Nuggets Daily New Plan · April Update Challenge"

foreword

The previously developed service has been online for half a year, and elasticsearch's search is also in normal use. Recently, Party A has added a lot of functional upgrades. I took a look at the product documentation, and it's not good. I feel like I need to add a lot of fields. Before adding fields or something, because of unfamiliarity, both elasticsearch and canal delete the index, and then synchronize the data from mysql through the full update command.

After carefully reading the elasticsearch document, I found that if you just simply add fields, it can be achieved through the api.

Add field to existing index

First of all, I understand from the documentation that elasticsearch does not allow you to directly modify the field type, or directly delete the field, unless you rebuild the index. However, it is supported for you to add fields in the index, which is the advantage of the document database. It is very free to add fields, and even in a table, the fields of different records can be different.

Elasticsearch allows us to declare fields when creating an index, or add fields to an existing index. For example, we already have an index and there is data in it, then we can use the _mapping api to achieve this function

curl --location --request PUT 'http://127.0.0.1:9200/test/_mapping' \
--header 'Authorization: Basic ZWxhc3RpYzplbGFzdGlj' \
--header 'Content-Type: application/json' \
--data-raw '{
    "properties": {
        "visible": {
            "type": "short"
        },
        "post_type": {
            "type": "short"
        },
        "post_status": {
            "type": "short"
        },
        "check_state": {
            "type": "short"
        },
        "gmt_checked": {
            "type": "keyword"
        },
        "official": {
            "type": "boolean"
        },
        "labels": {
            "type": "text"
        }
    }
}'
复制代码

In properties declare the name and type of the field we want to add.

elasticsearch array field

In msysql, if it is an array type, we generally have two designs. One is to splicing this array field with any separator and store it in one field; the other is to use a one-to-many relationship. Create two tables separately to implement a table that scatters the values ​​in the array into a single field, and then uses another field for association, so that the two tables can implement an array of fields.

但是在elasticsearch中并不是这样,首先文档存储非常自由,一个文档中存储的字段非常的自由。elasticsearch中并没有arrays的字段类型,任意一个字段都可以拥有0个或多个值,换句话说,如果你这个字段拥有了多个值,那么你就是数组类型,这里有一个要求就是这个字段中多个值的类型必须相同不能混合,比如[1,'abc']这种数字和字符串混合的就不行,不被elasticsearch所支持。

elasticsearch对于数组字段的搜索是这样的

PUT my-index-000001/_doc/1
{
  "message": "some arrays in this document...",
  "tags":  [ "elasticsearch", "wow" ], 
  "lists": [ 
    {
      "name": "prog_list",
      "description": "programming list"
    },
    {
      "name": "cool_list",
      "description": "cool stuff list"
    }
  ]
}

GET my-index-000001/_search
{
  "query": {
    "match": {
      "tags": "elasticsearch" 
    }
  }
}
复制代码

这里可以把tags包含elasticsearch的给筛选出来,应该是数组类型,会按照每个元素来进行分组,所以可以和搜索普通字段一样搜索数组。

canal 同步数组字段

我们当前使用的架构是mysql 通过canal将数据同步到elasticsearch中,鉴于elasticsearch文档中字段类型的灵活和多样性,不可避免的我们会同步到数组字段或者对象字段。那么如何去实现这一个功能呢?假设我们有一个商品的实体,它有个标签的属性,这个属性是允许有多个值的。我们之前了解,canal是通过一个sql查询把需要同步的数据查询出来,投影出自己需要的字段,然后对应同步到elasticsearch中。对于复杂字段,给出了这样的解决方案

image.png

在上面给出了如果你是希望你的sql中有些查询出来的字段要被映射成一个数组字段的话,那首先第一步开启和sql属性同级的objFields。这个字段主要是配置数组和json对象字段的。如果我们需要的是数组那么按照这样的格式配置 字段名: arrays:分隔符号 也就是我们必须在sql中查询返回结果中要带上这个分隔符号。 mysql中提供了一个聚合的拼接函数group_concat。这个函数的作用是将一系列非null的值串联起来成一个字符串。默认是通过,拼接,我们可以显式的指定拼接的符号,比如分号,回到我们的假设中,商品对应多个标签。我们可以构建一个查询 select product_id,group_concat(label order by label_id asec separator ';') as labels from product_labels group by product_id; 这样就可以把product关联的label通过先通过group by聚合起来,然后通过group_concat拼接。这里需要注意这种拼接函数默认字符拼接后最大长度1024,可以通过修改mysql全局参数group_concat_max_len来扩大。 我们修改canal的yml文件,重启canal,然后测试一下我们的同步

image.png

同步后会生成数组类型的字段。 也就是canal是能够支持我们把mysql中的一对多字段以数组的形式同步到elasticsearch中的。

Guess you like

Origin juejin.im/post/7086382152744435743