ElasticSearch Chinese word segmentation and fuzzy query

foreword

        ElasticSearch is a distributed real-time document storage, each field can be indexed and searched, and can support PB-level structured or unstructured data. The global search we used in the early days was a simple SQL fuzzy query. In order to share the pressure on the database, ES was used. In addition to the above points, the reason for choosing it is that it provides a simple API method, which is applicable to any docking programming language. The following will improve the search function based on the business scenario of PHP.

environment

ThinkPHP5.1

ElasticSearch7.8

PHP7.3

Introduction to Functional Iteration

        The original ES replacement method solved the problem of search speed, and later the new ik tokenizer solved the problem of a single search term. Single is also the reason for Chinese word segmentation, and each sentence is disassembled into words of specified granularity. When encountering a word, generally only the words in a sentence are disassembled, and a certain letter needs to be input when searching, and the output is expected to be the entry with the letter behind it, that is, fuzzy query. After digging through the documentation, I found instant search.

       Instant search or search-as-you-type (search-as-you-type) means that when users are accustomed to typing in the query content, the search results can be displayed for them, not only can get the search results in a shorter time, but also guide Results that actually exist in the user's search index. For example, if you enter dvd r, you will get instant search: dvd r9s and dvd r9sk, etc. The following will demonstrate the effect through a complete example.

Mapping of configuration indexes

        The construction of the ElasticSearch environment and basic operations are explained in the previous article. Here we pretend that the index has been created. The following is the mapping of the index. The addition of documents is also defined according to your own needs, whether it is a scheduled task, a business node trigger, or a collection tool synchronization.

{
  "mappings": {
    "_doc": {
      "properties": {
        "class_id": {
          "type": "long"
        },
        "goods_name": {
          "type": "text",
          "analyzer": "ik_smart"
        },
        "goods_sort": {
          "type": "keyword"
        },
        "id": {
          "type": "keyword"
        },
        "price": {
          "type": "long"
        },
        "single_goods_name": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "state": {
          "type": "keyword"
        },
        "v": {
          "type": "long"
        }
      }
    }
  }
}

front desk search

1. i.e. search example

{
    "match_phrase_prefix" : {
        "brand" : {
            "query": "walker johnnie bl", 
            "slop":  10
        }
    }
}

2. Business code

public function queryData($key, $sort, $from = 0, $size = 10)
{
	$from = $from * $size;
	$indexName = Env::get('elasticsearch.goods_index') ?? 'products';

	$params = [
		'index' => $indexName,
		'client' => [
			'timeout' => 10,        
			'connect_timeout' => 10
		],
		'body' => [
			'from' => $from,
			'size' => $size,
			'query' => [
				'bool' => [
					'should' => [
						[
							'multi_match' => [
								'query' => $key,
								'fields' => [
									'goods_name^2',
									'single_goods_name'
								],
							],
						],
						[
							'wildcard' => [
								'single_goods_name' => "$key*"
							]
						],
						[
							'fuzzy' => [
								'single_goods_name' => [
									'value' => $key
								]
							]
						],
						[
							'match_phrase_prefix' => [
								'single_goods_name' => "$key"
							]
						],
					],
				],
			],
			'sort' => [
				['_score' => "desc"],
				['goods_sort' => 'desc']
			],
		]
	];

	return $this->es->search($params);
}

3. Effect demonstration

Guess you like

Origin blog.csdn.net/qq_35704550/article/details/130575506