Elasticsearch word segmentation search

I did a word segmentation search of ES before, and I found very little information on the Internet. I can only scrape a little bit according to the API. It is still good to look handsome. I successfully searched through word segmentation. The applicable scenarios are based on certain fields. The word segmentation matching and specifying the different priorities of the fields are all done directly through http requests. The implementation is done by encapsulating it in java, and the java code will not be posted. You can encapsulate it according to your own needs
. word segmentation query takes about two steps. Operation: 1. Set the mapping under an index of ES, 2, perform query search for the field. The

premise is to install the word segmentation plug-in of ES first. Reference address: http://ludizhang.iteye.com/blog/2323939

1. Set the index mapping
direction ES sends http PUT request,
url: http://ip:port/indexName
postBody :
{
    "mappings": {
        "testBase2": {
            "properties": {
                "field1": {
                    "type": "string",
                    "index": "analyzed",
                    "analyzer": "ik",
                    "search_analyzer": "ik",
                     "store":"yes"
                },
               "field1": {
                    "type": "string",
                    "index": "analyzed",
                    "analyzer": "ik",
                    "search_analyzer": "ik",-- specify the tokenizer
                    "store":"yes"
              }
            }
        }
    }
}


How to judge the success of the setting? You can refer to [img]



[/img]

There is mappingg information in the black box, not the mapping in the setting, but a direct mapping node,
and the page in the picture is based on ES _plugin 2. Word segmentation query of the /head/ plugin

Send the POST request url of the query to ES

: http://ip:port/indexName/typeName/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "analyzer": "ik",-- analyzer, based on IK word segmentation
                        "default_field": "field1",-- query field
                        "query": "China and the US",-- match content
                        "boost": 6 -- query weight
                    },
                    "query_string": {
                        "analyzer": "ik",
                        "default_field": "field2",
                        "query": "China and America",
                        "boost": 4
                    }
                }
            ]
        }
    },
"size": 10,-- paging settings, the number of entries per page
"from": 0-- start index
}


查询结果
{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 4,-- 总条数
        "max_score": 1,
        "hits": [
            {
                "_index": "index",
                "_type": "testBase2",
                "_id": "3187",
                "_score": 1,
                "_source": {
                    "brandName":"test5",
                    "classifyId": 23,
                    "labelWord": "标签3",
                    "videoName": "Two segments, 9-12,14-16",
                    "brandId": 6,
                    "videoDesc": "Video Introduction 5",
                    "videoId": 3187,
                    "classifyName": "Life aa",
                    " keyWord": "Key 4"
                }
            },
            ....
        ]
    }
}

******************** Update, use a simpler way to perform complex queries
{
"fields": ["videoName","videoDesc"],//Specify the returned fields
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "fields": ["videoName^9","videoDesc^1"],//Query field + weight
                         "analyzer": "ik",
                        "query": "Solution"
                    }
                }
            ]
        }
    },"from": 0, "size": 60
}


The returned result is similar to the format returned by the previous query method, the difference is that our query is the specified field query in the
returned data part
{
    "took": 38,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2444,
        "max_score": 1,
        "hits": [
            {
                "_index": "index",
                "_type": "testBase2",
                "_id": "1230",
                "_score": 1,
                "fields": {
                    "videoName": [
                        "All the way north"
                    ],
                    "videoDesc": [
                        "Driving a Mercedes-Benz, all the way north. On every road, it's fun to play, if I can change freely, I will run to the end. | Mercedes-Benz"
                    ]
                }
            }
        ]
    }
}


The difference is that the _source: field in the internal hits has become the fields field, and the returned data format has changed from simple json data to json nested jsonArray data, which is a little more troublesome to parse. It depends on personal preference.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326620746&siteId=291194637