Search in E-commerce Scenario - Exploration of es Search Based on ik Segmentation

Get into the habit of writing together! This is the 5th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

foreword

On the fourth day, we talked about how to use kabana for query debugging of es. Today, let's explore common searches in e-commerce scenarios, and search es data based on ik word segmentation

data modeling

We need to confirm the requirements before development. Which data needs to be searched? In addition to implementing search, what problems should we pay attention to? Taking Taobao as an example, we have two main places when searching, 1 is to match words, and 2 is to achieve sorting. The following search results let us analyze the fields that need to be filtered and sorted when searching15774e5cd91b1fd5e34bd6a73401e3a.jpg

  • search field
  1. Specifications (specification query is more complicated, we will not discuss how to implement it for now)
  2. name
  3. Keywords or Introduction
  • Sort field
  1. If the advertisement is placed, it will be ranked first
  2. sales
  3. price
  • other
  1. Paginated search

Based on the above, we simply model goods as the following fields: id (unique identifier), name (name), keyWord (keyword), sellNum (sales), price (price), sort (sort) Note: We don't have an in-depth analysis here. In fact, the product should have other information such as whether it is recommended or not.

Initialization data

The process of using kabana to initialize the product index is as follows

create index

PUT goods
{
	"settings": {
		"number_of_shards": 1,
		"number_of_replicas": 0
	}, 
        "mappings": {
	  "properties": {
	    "id":{
	      "type": "keyword"
	    },
	    "name":{
	      "type": "text",
	      "analyzer": "ik_max_word",
	      "search_analyzer": "ik_smart"
	    },
	    "keyWords":{
	      "type": "text",
	      "analyzer": "ik_max_word",
	      "search_analyzer": "ik_smart"
	    },
	    "sellNum":{
	      "type": "integer"
	    },
	    "price":{
	      "type": "double"
	    },
	    "sort":{
	      "type": "integer"
	    }
	  }
	}
}
复制代码

For the commonly used es data structures, please refer to this blog post es common data structures . Let me briefly introduce the points that need to be paid attention to.

  1. Partitions and backups: Here we recommend a multi-partition and multi-backup mode in the production environment to prevent data loss
  2. Why is the id of the keyword type: because the keyword is the keyword data type, the field of the keyword type will not be analyzed. For example, we now have an id of 123456 and another 12345. If we set the index to text type, If we want to delete the data of 12345, we will delete 123456 and other data containing 12345 at the same time.
  3. analyzer\search_analyzer分词类型选择:分词时我们分别插入文档时,将text类型的字段做最细的分词然后插入倒排索引,在查询时,先对要查询的text类型的输入做最粗分词,再去倒排索引搜索。举个例子:假如插入数据为"苹果手机"的时候我们尽可能多拆为"苹果","手机",我们在搜索的"小米手机"时候,就不会查询出苹果手机这条数据了

初始化数据

我们初始化9条数据验证一下我们的查询,其中3条广告,3条苹果,3条华为请求体如下所示

PUT goods/_doc/100001
{
  "id":"100001",
  "name":"广告位001",
  "keyWords":"手机 智能手机 5G手机 热搜预定手机",
  "sellNum":1,
  "price":12,
  "sort": 1
}
PUT goods/_doc/100002
{
  "id":"100002",
  "name":"广告位002",
  "keyWords":"手机 智能手机 5G手机 热搜预定手机",
  "sellNum":2,
  "price":11,
  "sort": 2
}
PUT goods/_doc/100003
{
  "id":"100003",
  "name":"广告位003",
  "keyWords":"手机 智能手机 5G手机 热搜预定手机",
  "sellNum":3,
  "price":10,
  "sort": 3
}
PUT goods/_doc/100004
{
  "id":"100004",
  "name":"苹果手机001",
  "keyWords":"手机 智能手机 IOS 苹果手机",
  "sellNum":4,
  "price":9,
  "sort": 4
}
PUT goods/_doc/100005
{
  "id":"100005",
  "name":"苹果手机002",
  "keyWords":"手机 智能手机 IOS 苹果手机",
  "sellNum":5,
  "price":8,
  "sort": 5
}
PUT goods/_doc/100006
{
  "id":"100006",
  "name":"苹果手机003",
  "keyWords":"手机 智能手机 IOS 苹果手机",
  "sellNum":6,
  "price":7,
  "sort": 6
}
PUT goods/_doc/100007
{
  "id":"100007",
  "name":"华为001",
  "keyWords":"手机 智能手机 麒麟 国产手机",
  "sellNum":7,
  "price":6,
  "sort": 7
}
PUT goods/_doc/100007
{
  "id":"100007",
  "name":"华为001",
  "keyWords":"手机 智能手机 麒麟 国产手机",
  "sellNum":7,
  "price":6,
  "sort": 7
}
PUT goods/_doc/100009
{
  "id":"100009",
  "name":"苹果手机003",
  "keyWords":"手机 智能手机 麒麟 国产手机",
  "sellNum":9,
  "price":4,
  "sort": 9
}
复制代码

各个场景的实际探索

以下请求体都是放在kabana中执行的

最简单的搜索全部,按照销量倒排序,分页查询前5条,我们传入的传入的请求体如下

post goods/_search
{
  "query":{"match_all": {}},
  "from":0,
  "size":5,
  "sort":{
    "sort":"desc"
  }
}
复制代码

这里解释一下几个关键属性吧

  • query:就是封装查询体的属性,例如匹配、匹配全部等等
  • from:从第几条开始,注意开始的位置是0
  • size:每页的条数
  • sort:外层的sort代表排序,,内层的sort指定排序字段就是sort并且按照desc排序

我们通过查看响应可以看到查询出所有的total为9hits显示了5条,证明我们实现了分页效果

输入手机且默认查询:默认排序,匹配名称或者关键字

例如输入智能手机,我们应该会查询出所有的数据(匹配到了keyWords)且前几条按照默认排序的话应该是广告位

POST goods/_search
{
	"query": {
		"bool": {
			"should": [{
					"match": {
						"name": "智能手机"
					}
				},
				{
					"match": {
						"keyWords": "智能手机"
					}
				}
			]
		}
	},
	"from": 0,
	"size": 5,
	"sort": {
		"sort": "asc"
	}
}
复制代码

这里我们可以看到多了几个标签,我简单介绍一下 bool:这个的意思是过滤查询,下面还有很多和bool组合起来的属性,大家可以自己到es官网查找一波 should:意思是应该,例如我们这个案例在搜搜智能手机的时候should匹配到了keyWords属性上 match: 就是匹配的意思

我想买个麒麟系统的手机,且按照售价或销量来倒序查询

我们这里应该返回3条数据,我这里先将分页设置为一页2条,我们从第二条开始

  • 按照售价倒序查询含有麒麟的商品
post goods/_search
{
	"query": {
		"bool": {
			"should": [{
					"match": {
						"name": "麒麟"
					}
				},
				{
					"match": {
						"keyWords": "麒麟"
					}
				}
			]
		}
	},
	"from": 2,
	"size": 2,
	"sort": {
		"price": "desc"
	}
}
复制代码
  • 按照销量倒序查询含有麒麟的商品
post goods/_search
{
	"query": {
		"bool": {
			"should": [{
					"match": {
						"name": "麒麟"
					}
				},
				{
					"match": {
						"keyWords": "麒麟"
					}
				}
			]
		}
	},
	"from": 2,
	"size": 2,
	"sort": {
		"sellNum": "desc"
	}
}
复制代码

结语

In this article, we analyze the simple es search implementation in the e-commerce scenario based on the ik tokenizer. The basic introductory articles about es are almost the same. In the next article, we will discuss how to use spring boot to achieve our needs today.
Next article: Realize ik word segmentation search in e-commerce scenarios based on spring boot, so stay tuned !

Guess you like

Origin juejin.im/post/7083041913787875341