ElasticSearch学习之路-day17

转载自:https://blog.csdn.net/chengyuqiang/column/info/18392,ES版本号6.3.0

高级别全文检索通常用于在全文本字段(如电子邮件正文)上运行全文检索。 他们了解如何分析被查询的字段,并在执行之前将每个字段的分析器(或search_analyzer)应用于查询字符串。

match查询
(1)引例

GET website/_search
{
  "query": {
    "term": {
        "title": "centos升级"
    }
  }
}

返回

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

(2)and操作符

GET website/_search
{
  "query": {
    "match": {
        "title": {
          "query":"centos升级",
          "operator":"and"
        }
    }
  }
}

返回结果

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "title": "CentOS升级gcc",
          "author": "程裕强",
          "postdate": "2016-12-25",
          "abstract": "CentOS升级gcc",
          "url": "http://url.cn/53868915"
        }
      }
    ]
  }
}

(3)or操作符

GET website/_search
{
  "query": {
    "match": {
        "title": {
          "query":"centos升级",
          "operator":"or"
        }
    }
  }
}

返回

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.9227539,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "6",
        "_score": 0.9227539,
        "_source": {
          "title": "CentOS更换国内yum源",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS更换国内yum源",
          "url": "http://url.cn/53946911"
        }
      },
      {
        "_index": "website",
        "_type": "blog",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "title": "CentOS升级gcc",
          "author": "程裕强",
          "postdate": "2016-12-25",
          "abstract": "CentOS升级gcc",
          "url": "http://url.cn/53868915"
        }
      }
    ]
  }
}

总结:term代表精确匹配,title必须为centos升级才能被查出,match先分词再进行匹配,加上operator操作符,代表分词的结果中必须包含centos升级才能被查出。

match_phrase查询(短语查询)
match_phrase与match query类似,但用于匹配精确词语,可称为短语查询。
match_parase查询会将查询内容分词,分词器可以自定义,文档中同时满足以下两个条件才会被检索到:a.分词后所有个此项都要出现在该字段内;b.字段中的词项顺序要一致
(1)创建索引,插入数据

DELETE test
PUT test
PUT test/hello/1
{ "content":"World Hello"}
PUT test/hello/2
{ "content":"Hello World"}
PUT test/hello/3
{ "content":"I just said hello world"}

(2)使用match_phrase查询"hello word"

GET test/_search
{
  "query": {
    "match_phrase": {
      "content": "hello world"
    }
  }
}

返回结果为

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test",
        "_type": "hello",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "content": "Hello World"
        }
      },
      {
        "_index": "test",
        "_type": "hello",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "content": "I just said hello world"
        }
      }
    ]
  }
}

match_phrase_prefix查询(前缀查询)
match_phrase_prefix与match_phrase相同,只是它允许在文本中的最后一个词的前缀匹配。也就是说对match_phrase进行了扩展,查询内容的分词只要满足前缀匹配即可。

GET test/_search
{
  "query": {
    "match_phrase_prefix": {
      "content": "hello worl"
    }
  }
}

返回

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test",
        "_type": "hello",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "content": "Hello World"
        }
      },
      {
        "_index": "test",
        "_type": "hello",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "content": "I just said hello world"
        }
      }
    ]
  }
}


multi_match
multi_match查询是match查询的升级版,用于多字段检索

GET website/_search
{
  "query": {
    "multi_match": {
      "query": "centos",
      "fields": ["title","abstract"]
    }
  }
}

返回结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0.9227539,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "6",
        "_score": 0.9227539,
        "_source": {
          "title": "CentOS更换国内yum源",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS更换国内yum源",
          "url": "http://url.cn/53946911"
        }
      },
      {
        "_index": "website",
        "_type": "blog",
        "_id": "2",
        "_score": 0.41360322,
        "_source": {
          "title": "watchman源码编译",
          "author": "程裕强",
          "postdate": "2016-12-23",
          "abstract": "CentOS7.x的watchman源码编译",
          "url": "http://url.cn/53844169"
        }
      },
      {
        "_index": "website",
        "_type": "blog",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "CentOS升级gcc",
          "author": "程裕强",
          "postdate": "2016-12-25",
          "abstract": "CentOS升级gcc",
          "url": "http://url.cn/53868915"
        }
      },
      {
        "_index": "website",
        "_type": "blog",
        "_id": "7",
        "_score": 0.20725916,
        "_source": {
          "title": "搭建Ember开发环境",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS下搭建Ember开发环境",
          "url": "http://url.cn/53947507"
        }
      },
      {
        "_index": "website",
        "_type": "blog",
        "_id": "1",
        "_score": 0.1627405,
        "_source": {
          "title": "Ambari源码编译",
          "author": "程裕强",
          "postdate": "2016-12-21",
          "abstract": "CentOS7.x下的Ambari2.4源码编译",
          "url": "http://url.cn/53788351"
        }
      }
    ]
  }
}

可见文档中title和abstract字段有一个匹配就会被检索出来。

common_terms查询(常用词查询)
(1)停用词
有些词在文本中出现的频率非常高,但是对文本锁携带的基本信息不产生影响。比如英文中的a、an、the、of,中文的“的”、”了”、”着”、”是” 、标点符号等。文本经过分词之后,停用词通常被过滤掉,不会被进行索引。在检索的时候,用户的查询中如果含有停用词,检索系统也会将其过滤掉(因为用户输入的查询字符串也要进行分词处理)。排除停用词可以加快建立索引的速度,减小索引库文件的大小。
(2)虽然停用词对文档评分影响不大,但是有时停用词仍然具有重要意义,去除停用词显然不合适。如果去除停用词,就无法区分“happy”和”not happy”, “to be or not to be”就不能被索引,搜索的准确率就会降低。
(3)common_terms查询提供了一种解决方案,把查询分次后的词项分为重要词项(比如low frequency terms,低频词)和不重要词(high frequency terms which would previously have been stopwords,高频的停用词)。在搜索时,首先搜索与重要词匹配的文档,然后执行第二次搜索,搜索评分较小的高频词。 
词项是高频词还是低频词,可以通过cutoff_frequency来设置阀值,取值可以是绝对频率 (>=1)或者相对频率(0.0 ~1.0)

GET website/_search
{
    "query": {
        "common": {
            "title": {
                "query": "to be",
                "cutoff_frequency": 0.0001,
                "low_freq_operator": "and"
            }
        }
    }
}

返回结果

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

参考学习的博客上又返回

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 2.364739,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "9",
        "_score": 2.364739,
        "_source": {
          "title": "to be or not to be",
          "author": "somebody",
          "postdate": "2018-01-03",
          "abstract": "to be or not to be,that is the question",
          "url": "http://url/63991802"
        }
      }
    ]
  }
}

不知道什么原因。
 

猜你喜欢

转载自blog.csdn.net/qq_23536449/article/details/91366030
今日推荐