Elasticsearch series --- acquaintance search

Overview

This part describes the meaning of the message structure search, search timeout processing, mentioned a bit more than the index search and lightweight search, finally Refine search and full-text search to do a simple comparison.

Empty search

Search API is the simplest form does not specify the type of index and air search will return all clustered index of all documents (default display 10):

GET /_search

Example response results (with screening, just take a document as an example):

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "music",
        "_type": "children",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "wake me, shark me",
          "content": "don't let me sleep too late, gonna get up brightly early in the morning",
          "language": "english",
          "length": "55",
          "likes": 9
        }
      }
    ]
  }
}

A response message field, we do some simple explanation:

  • took: how much to spend the entire search request milliseconds.
  • time_out: whether the query timeout.
  • _shards: participation in the total number of queries slice, slice in which the number of successful and failed number of fragments, and the number of fragments skipped. There will be no failure of the number of fragments under normal circumstances, if a disaster-level failure, exceeded the maximum number of node fault-tolerant, it may also be lost shard and replica, this time to report these fragments is a failure, but still results continue to return the remaining available slice.
  • hits: comprising a total represents the total number of matched documents, max_score is the maximum value of all the matching documents _score.
  • hits.hits: within the array contains complete information matching documents, the default 10 before the data query and sort _score descending order.

timeout mechanism

The default timeout parameter is not used, if certain scenarios, low response is more important than a complete search results, you can specify the timeout of 10ms or 1s, within a specified timeout period, Elasticsearch will have a successful search for the documents returned.
Note that timeout is not to stop executing a query, it just tells Coordinate Node returns the result to the specified time so far collected, and close the connection, ES in the background, the other node ongoing inquiry and will not be interrupted, but nobody wanted results.

For example: a business platform product category SKU 3000000, enter a keyword search, there are 2000 records match, but 15 seconds to check a search to wait 15 seconds before a result, it is too professional products are SLA requirements, the result must be within 1 second, the fastest solution is to use a query parameter timeout = 1s, the front page is displayed by default only show 20, the query results within one second to fill this 20 is relatively easy of.

Set the query process timeout

Multi-index search

A search request, you can write the name of multiple indexes at the same time, this is called multi-index search mode.

/ _search: all index, all the data are searched out of all type
/ index1, index2 / _search: while searching for data in the two index
/ . 1, 2 / _search: a plurality of index according to match the wildcard

When searching for a single index lower, ES will forward the request to each shard index, or Shard replica can then collect the results returned. When multiple indexes, the same principle, only more fragmented involved. Also a search index with five slices and each five indexed search a fragment, equivalent performance.

By the way we look at search Schematic:

Search Schematic

Lightweight search

There are two forms of search API, one is the query string search, query and sorting rules written in the request URI, also known as lightweight searching; the other is the query DSL, query and other information in JSON format written request body in.

Examples of lightweight search:

Single field search, "q =" followed by contact query "field: text", field is the field name, text a keyword search, there are three prefix modifiers:

GET /music/children/_search?q=content:friend
GET /music/children/_search?q=+content:friend
GET /music/children/_search?q=-content:friend
  • "+" Sign prefix must match the search criteria.
  • "-" sign prefix certainly does not match the search criteria.
  • I did not write the default prefix condition optional

The more that match the criteria, the more relevant documents.

If multiple field searches, among a number of conditions have spaces:

GET /music/children/_search?q=-content:friend +name:wake

Principle _all metadata
if the "q =" did not write back field, directly with the search keyword, indicating that all the fields in the search index is specified, as follows:

GET /music/children/_search?q=friend

As long as the document index in music, a field that contains any friend, will be able to search out. That _all is how come?

_all是Elasticsearch中的元数据,在建立索引的时候,新增一个document里面包含了多个field,此时,es会自动将多个field的值,全部用字符串的方式串联起来,变成一个长的字符串,作为_all field的值,同时建立索引。后面如果在搜索的时候,没有对某个field指定搜索,就默认搜索_all field。

找个document示例:

"name": "wake me, shark me",
"content": "don't let me sleep too late, gonna get up brightly early in the morning",
"language": "english",
"length": "55",
"likes": 9

"wake me, shark me don't let me sleep too late, gonna get up brightly early in the morning english 55 9",作为这一条document的_all field的值,同时进行分词后建立对应的倒排索引

注意事项

轻量搜索在开发阶段会拿这些命令来做一些简单的查询,实际生产中用得比较少,语法复杂容易错,并且可阅读性低,遇到重量级查询,还有可能会把ES集群拖垮。

精确搜索与全文搜索

Elasticsearch的数据类型可以分成两类:精确值和全文。

  • 精确值(exact value)
    精确值如日期、ID,数值类型,有些文本类型也可以表示精确值,如邮箱、常用缩写等等。精确值的一个特点是必须完全相同、大小写敏感,很容易查询,hello与Hello是不相等的,日期为2019-11-20的字段值,输入2019是搜索不到的。

  • 全文(full text)
    全文数据就微妙得多,拿英文来说,各种词根变化、大小写转换、同义词、缩写,汉字方面各种分词、词库、网络词等,都希望匹配程度能高一些,能够理解我们的意图,举几个中文例子:
  • 南京市长江大桥,有一些分词器得到的结果:南京/市长/江大桥,完全不是我们想的结果,我们希望是:南京/南京市/长江/大桥/长江大桥。
  • 长春市长春街长春药店,分词分得不对,搞成这样:长春/市长/春/街/长/春药/店,结果就很尴尬了。

Full text search, the most basic step is the first word, then the index, and then match the search, English is relatively easy to handle, Chinese terms have less than similar difficulties to be overcome.

summary

This introduction the basics of search to explain the meaning of the search results, use the basic search and lightweight multi-index search, and finally compared what precise search and full-text search, as well as the famous Chinese word pit, thank you.

High focus on Java concurrency, distributed architecture, more dry goods share technology and experience, please pay attention to the public number: Java Architecture Community
Java Architecture Community

Guess you like

Origin www.cnblogs.com/huangying2124/p/12071185.html