Elasticsearch学习/调优

Elasticsearch 是一个分布式、高扩展、高实时的搜索与数据分析引擎。它能很方便的使大量数据具有搜索、分析和探索的能力。充分利用Elasticsearch的水平伸缩性，能使数据在生产环境变得更有价值。Elasticsearch 的实现原理主要分为以下几个步骤，首先用户将数据提交到Elasticsearch 数据库中，再通过分词控制器去将对应的语句分词，将其权重和分词结果一并存入数据，当用户搜索数据时候，再根据权重将结果排名，打分，再将返回结果呈现给用户。

Elasticsearch是一个全文检索服务器（全文检索是一种非结构化数据的搜索方式）

结构化数据：指具有固定格式固定长度的数据，如数据库中的字段。

非结构化数据：指格式和长度不固定的数据，如电商网站的商品详情

结构化数据一般存入数据库，使用sql语句即可快速查询。但由于非结构化数据的数据量大且格式不固定，我们需要采用全文检索的方式进行搜索。全文检索通过建立倒排索引加快搜索效率。

索引

索引：将数据中的一部分信息提取出来，重新组成一定的数据结构，然后根据该结构进行快速搜索。索引就是目录，列如字典会将字的拼音提取出来做成目录，通过目录可以快速找到字的位置。

正向索引（正排索引）

将文档ID建立为索引，通过ID可以快速查找到数据，如果数据库中的主键就会创建正排索引。

反向索引（倒排索引）

非结构化数据中往往会根据关键词查询数据，将数据中的关键词建立索引，指向文档数据，这样的索引称为反向索引。

Elasticsearch应用场景

2013年初，GitHub抛弃了Solr，采取Elasticsearch来做PB级的搜索。GitHub使用Elasticsearch搜索20TB 的数据，包括13亿文件和1300亿行代码。
维基百科：以Elasticsearch为基础的核心搜索架构。
百度：百度目前广泛使用Elasticsearch作为文本数据分析，采集百度所有服务器上的各类指标数据及用户自定义数据。目前覆盖百度内部20多个业务线（包括casio、云分析、网盟、预测、文库、
直达号、钱包、风控等），单集群最大100台机器，200个ES节点，每天导入30TB+数据。
新浪使用ES分析处理32亿条实时日志。
阿里使用ES构建自己的日志采集和分析体系。
可以使用Elasticsearch实现全站搜索，线上商城系统的搜索，分析日志等功能。

Elasticsearch和Solr比较

目前Elasticsearch的市场占有率越来越高，Spring从2020年起也已经停止Spring Data Solr的维护，更多的公司使用Elasticsearch作为搜索引擎。

Solr是Apache下的顶级开源项目，采用java开发，它也是基于Lucene的全文检索服务器。solr提供优化比Lucene跟为丰富的查询语言，同时实现了可配置、可拓展，并对索引、搜索性能进行了优化 solr可以独立运行，运行在jetty、tomcat等这些servlet容器中，Sole索引的实现方法很简单，用post方法向solr服务器发送一条可描述Filed及其内容的XML文档，Solr根据xml文档的添加、删除、更新索引、Solr搜索只需要发送HTTP GET请求，然后对solr返回xml、json等格式的查询结果进行解析，组织页面布局、solr不提供构建UI的功能，solr提供了一个管理界面，通过管理界面可以对查询的solr的配置和运行情况 Solr是一个开源搜索平台，用于构建搜索应用程序。是一个独立的企业级搜索应用服务器，它对外提供类似于Web-service的API接口它建立在Lucene(全文搜索引擎)之上Solr是企业级的，快速的和高度可扩展的。

Solr利用Zookeeper进行分布式管理，而Elasticsearch自身带有分布式协调管理功能;
Solr支持更多格式的数据，而Elasticsearch仅支持json文件格式；
Solr官方提供的功能更多，而Elasticsearch本身更注重于核心功能，高级功能多由第三方插件提供；
Solr在传统的搜索应用中表现好于Elasticsearch，但在处理实时搜索应用时效率明显低于Elasticsearch。

Elasticsearch数据结构

文档（Document）

文档是可被查询的最小数据单元，一个文档（Document）就是一条数据。类似关系型数据库的记录的概念。

类型（Type）

具有一组共同字段的文档定义成一个类型，类似关系型数据库中的表的概念

索引（Index）

索引是多种类型文档的集合，类似于关系型数据库中的库的概念

域（Fied）

文档由多个域组成，类型关系型数据中字段的概念

ES7.x之后删除了type的概念，一个索引不会代表一个库，而是一张张表

映射（Mapping）

映射是定义一个文档和它所包含的字段如何被存储和索引的过程，在默认配置下ES，可以根据插入的数据自动的创建mapping，也可以手动创建mapping。mapping中主要包括字段名，字段类型等等。

Elasticsearch索引操作

Elasticsearch使用Restful风格请求访问操作的，请求参数和返回值都是json格式.

索引操作

索引是多种类型文档的集合，类似于关系型数据库中的库的概念

创建索引

# 1.创建索引
# PUT /索引名
PUT /products

# 更改副本数量：
PUT /products/_settings
{
    
    
  "number_of_replicas": 0
}

# 2.创建索引进行索引分片设置
PUT /test_0001
{
    
    
  "settings":{
    
    
    "number_of_shards": 1, #指定主分片的数量
    "number_of_replicas": 0 #指定副本分片的数量
  }
}

查询索引

# 查询索引
GET /_cat/indices?v

green状态：每个索引的primary shard和replica shard都是active状态
yellow ：每个索引的primary shard都是active状态，但是部分replica shard不是active状态，处于不可用状态
red: 不是所有的索引的primary shard都是active状态，部分索引有数据丢失了

删除索引

#3.删除索引
DELETE /索引名

DELETE /* # 删除所有索引

映射操作

创建映射

PUT /test_0002
{
    
     
  "settings": {
    
    
    "number_of_shards": 1,
    "number_of_replicas": 0
  }, 
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type": "keyword"
      },
      "price":{
    
    
        "type": "double"
      },
      "created_at":{
    
    
        "type": "date"
      },
      "description":{
    
    
        "type": "text"
      }
    }
  }
}

查询映射

# 1.查看某个索引的映射
GET /索引名/_mapping

文档操作

文档是可被查询的最小数据单元，一个文档（Document）就是一条数据。类似关系型数据库的记录的概念。

添加文档

# 指定文档id

POST /test_0002/_doc/1 
{
    
    
  "title": "iphone13",
  "price": 8999.99,
  "created_at": "2022-02-15",
  "description": "iPhone 13屏幕采用6.1英寸OLED屏幕。"
}

# 结果返回

{
    
    
  "_index" : "test_0002",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

查询文档

GET /test_0002/_doc/1

#结果返回
{
    
    
  "_index" : "test_0002",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    
    
    "title" : "iphone13",
    "price" : 8999.99,
    "created_at" : "2022-02-15",
    "description" : "iPhone 13屏幕采用6.1英寸OLED屏幕。"
  }
}

删除文档

DELETE /test_0002/_doc/1

更新文档

#这种更新方式是先删除原始文档，在将更新文档以新的内容插入
PUT /products/_doc/2
{
    
    
  "title":"iphon14"
}

#这种方式可以将数据原始内容保存,并在此基础上更新。
POST /test_0002/_doc/2/_update
{
    
    
    "doc" : {
    
    
        "title" : "iphon13"
    }
}

批量操作

#批量索引两条文档
POST /test_0002/_doc/_bulk 
{
    
    "index":{
    
    "_id":"3"}}
{
    
    "title":"iphone14","price":8999.99,"created_at":"2021-09-15","description":"iPhone 13屏幕采用6.8英寸OLED屏幕"}
{
    
    "index":{
    
    "_id":"4"}}
{
    
    "title":"iphone14","price":8999.99,"created_at":"2021-09-15","description":"iPhone 15屏幕采用10.8英寸OLED屏幕"}
#批量时不会因为一个失败而全部失败,而是继续执行后续操作,在返回时按照执行的状态返回!

高级查询

语法：GET /索引名/_doc/_search {json格式请求体数据}

# 1.创建索引 映射
PUT /products/
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type": "keyword"
      },
      "price":{
    
    
        "type": "double"
      },
      "created_at":{
    
    
        "type":"date"
      },
      "description":{
    
    
        "type":"text"
      }
    }
  }
}

# 2.测试数据
PUT /products/_doc/_bulk
{
    
    "index":{
    
    }}
  {
    
    "title":"iphone12 pro","price":8999,"created_at":"2020-10-23","description":"iPhone 12 Pro采用超瓷晶面板和亚光质感玻璃背板，搭配不锈钢边框，有银色、石墨色、金色、海蓝色四种颜色。宽度:71.5毫米，高度:146.7毫米，厚度:7.4毫米，重量：187克"}
{
    
    "index":{
    
    }}
  {
    
    "title":"iphone12","price":4999,"created_at":"2020-10-23","description":"iPhone 12 高度：146.7毫米；宽度：71.5毫米；厚度：7.4毫米；重量：162克（5.73盎司） [5]  。iPhone 12设计采用了离子玻璃，以及7000系列铝金属外壳。"}
{
    
    "index":{
    
    }}
  {
    
    "title":"iphone13","price":6000,"created_at":"2021-09-15","description":"iPhone 13屏幕采用6.1英寸OLED屏幕；高度约146.7毫米，宽度约71.5毫米，厚度约7.65毫米，重量约173克。"}
{
    
    "index":{
    
    }}
  {
    
    "title":"iphone13 pro","price":8999,"created_at":"2021-09-15","description":"iPhone 13Pro搭载A15 Bionic芯片，拥有四种配色，支持5G。有128G、256G、512G、1T可选，售价为999美元起。"}

查询所有

GET /products/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  }
}

关键字查询

GET /products/_search
{
    
    
 "query": {
    
    
   "term": {
    
    
     "price": {
    
    
       "value": 4999
     }
   }
 }
}

NOTE1: 通过使用term查询得知ES中默认使用分词器为标准分词器(StandardAnalyzer),标准分词器对于英文单词分词,对于中文单字分词。

NOTE2: 通过使用term查询得知,在ES的Mapping Type 中 keyword , date ,integer, long , double , boolean or ip 这些类型不分词，只有text类型分词。

查询范围（range）

GET /products/_search
{
    
    
  "query": {
    
    
    "range": {
    
    
      "price": {
    
    
        "gte": 1400,
        "lte": 9999
      }
    }
  }
}

前缀查询(prefix)

GET /products/_search
{
    
    
  "query": {
    
    
    "prefix": {
    
    
      "title": {
    
    
        "value": "ipho"
      }
    }
  }
}

通配符查询(wildcard)

GET /products/_search
{
    
    
  "query": {
    
    
    "wildcard": {
    
    
      "description": {
    
    
        "value": "iphon*"
      }
    }
  }
}

多id查询(ids)

GET /products/_search
{
    
    
  "query": {
    
    
    "ids": {
    
    
      "values": ["1","2"]
    }
  }
}

模糊查询(fuzzy)

GET /products/_search
{
    
    
  "query": {
    
    
    "fuzzy": {
    
    
      "description": "iphooone"
    }
  }
}

注意: fuzzy 模糊查询最大模糊错误必须在0-2之间

搜索关键词长度为 2 不允许存在模糊
搜索关键词长度为3-5 允许一次模糊
搜索关键词长度大于5 允许最大2模糊

布尔查询(bool)

GET /products/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    
          "price": {
    
    
            "value": 4999
          }
        }}
      ]
    }
  }
}

bool 关键字: 用来组合多个条件实现复杂查询

must: 相当于&& 同时成立

should: 相当于|| 成立一个就行

must_not: 相当于! 不能满足任何一个

多字段查询(multi_match)

GET /products/_search
{
    
    
  "query": {
    
    
    "multi_match": {
    
    
      "query": "iphone13 毫",
      "fields": ["title","description"]
    }
  }
}

字段类型分词,将查询条件分词之后进行查询改字段如果该字段不分词就会将查询条件作为整体进行查询

默认字段分词查询(query_string)

GET /products/_search
{
    
    
  "query": {
    
    
    "query_string": {
    
    
      "default_field": "description",
      "query": "屏幕真的非常不错"
    }
  }
}

查询字段分词就将查询条件分词查询查询字段不分词将查询条件不分词查询

高亮查询(highlight)

highlight 关键字

#可以让符合条件的文档中的关键词高亮
GET /products/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "description": {
    
    
        "value": "iphone"
      }
    }
  },
  "highlight": {
    
    
    "fields": {
    
    
      "*":{
    
    }
    }
  }
}

自定义高亮html标签

#可以在highlight中使用`pre_tags`和`post_tags`
GET /products/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "description": {
    
    
        "value": "iphone"
      }
    }
  },
  "highlight": {
    
    
    "post_tags": ["</span>"], 
    "pre_tags": ["<span style='color:red'>"],
    "fields": {
    
    
      "*":{
    
    }
    }
  }
}

多字段高亮

#使用`require_field_match`开启多个字段高亮
GET /products/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "description": {
    
    
        "value": "iphone"
      }
    }
  },
  "highlight": {
    
    
    "require_field_match": "false",
    "post_tags": ["</span>"], 
    "pre_tags": ["<span style='color:red'>"],
    "fields": {
    
    
      "*":{
    
    }
    }
  }
}

返回指定条数(size)

#from 关键字: 用来指定起始返回位置，和size关键字连用可实现分页效果
GET /products/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "size": 5,
  "from": 0
}

指定字段排序(sort)

GET /products/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "price": {
    
    
        "order": "desc"
      }
    }
  ]
}

返回指定字段([_source])

GET /products/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "_source": ["title","description"]
}

聚合查询

根据某个字段分组

#根据某个字段进行分组统计数量
GET /products/_search
{
    
    
	"query": {
    
    
		"term": {
    
    
			"description": {
    
    
				"value": "iphone"
			}
		}
	}, 
	"aggs": {
    
    
		"price_group": {
    
    
			"teproductsrms": {
    
    
				"field": "price"
			}
		}
	}
}

求最大值

# 求最大值 
GET /products/_search
{
    
    
  "aggs": {
    
    
    "price_max": {
    
    
      "max": {
    
    
        "field": "price"
      }
    }
  }
}

求最小值

# 求最小值
GET /products/_search
{
    
    
  "aggs": {
    
    
    "price_min": {
    
    
      "min": {
    
    
        "field": "price"
      }
    }
  }
}

求平均值

# 求平均值
GET /products/_search
{
    
    
  "aggs": {
    
    
    "price_agv": {
    
    
      "avg": {
    
    
        "field": "price"
      }
    }
  }
}

求和

# 求和
GET /products/_search
{
    
    
  "aggs": {
    
    
    "price_sum": {
    
    
      "sum": {
    
    
        "field": "price"
      }
    }
  }
}

分词器

Analysis和Analyzer

分词就是将文档通过Analyzer分成一个个的Term，每一个Term都指向这个Term的文档。

Analysis：文本分析是把全文转换成一系列单词（term/token）的过程

Analyzer：是通过Analyzer来实现的。

Analyzer组成

在ES中默认使用标准分词器: StandardAnalyzer 特点: 中文单字分词单词分词。如:“我是中国人 this is good man----> 我是中国人 this is good man”

分析器（Analyzer）

组成部分（character filters、tokenizers、token filters）

character filter(字符过滤器)

在一段文本进行分词之前，先进行预处理，比如说最常见的就是，过滤html标签（hello --> hello），& --> and（I&you --> I and you）

tokenizers(分词器)

英文分词可以根据空格将单词分开,中文分词比较复杂,可以采用机器学习算法来分词。

Token filters (Token过滤器)

将切分的单词进行加工。大小写转换（例将“Quick”转为小写），去掉停用词（例如停用词像“a”、“and”、“the”等等），加入同义词（例如同义词像“jump”和“leap”）。

三者顺序:Character Filters—>Tokenizer—>Token Filter

三者个数:Character Filters（0个或多个） + Tokenizer + Token Filters(0个或多个)

内置分词器

Standard Analyzer - 默认分词器，英文按单词词切分，并小写处理
Simple Analyzer - 按照单词切分(符号被过滤), 小写处理
Stop Analyzer - 小写处理，停用词过滤(the,a,is)
Whitespace Analyzer - 按照空格切分，不转小写
Keyword Analyzer - 不分词，直接将输入当作输出

标准分词器

#按照单词分词 英文统一转为小写 过滤标点符号  中文单字分词
POST /_analyze
{
    
    
  "analyzer": "standard",
  "text": "this is a , good Man 中华人民共和国"
}

Simple(分词器)

#按照单词分词 英文统一转为小写 去掉符号 中文不分词
POST /_analyze
{
    
    
  "analyzer": "simple",
  "text": "this is a , good Man 中华人民共和国"
}

Whitespace(分词器)

#中文 英文 按照空格分词     英文不会转为小写  不去掉标点符
POST /_analyze
{
    
    
  "analyzer": "whitespace",
  "text": "this is a , good Man"
}

创建索引设置分词

PUT /索引名
{
    
    
  "settings": {
    
    },
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type": "text",
        "analyzer": "standard" //显示指定分词器
      }
    }
  }
}

中文分词器

IKAnalyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包。提供了两种分词算法：

ik_smart：最少切分
ik_max_word：最细粒度划分

ik分词器的版本要和es版本保持一致。

POST /_analyze
{
    
    
  "analyzer": "ik_smart",
  "text": "中华人民共和国国歌"
}

POST /_analyze
{
    
    
  "analyzer": "ik_max_word",
  "text": "中华人民"
}

扩展词、停用词配置

IK支持自定义扩展词典和停用词典

扩展词典就是有些词并不是关键词,但是也希望被ES用来作为检索的关键词,可以将这些词加入扩展词典。
停用词典就是有些词是关键词,但是出于业务场景不想使用这些关键词被检索到，可以将这些词放入停用词典。

定义扩展词典和停用词典可以修改IK分词器中config目录中IKAnalyzer.cfg.xml这个文件。

过滤查询

Filter Query

过滤查询，其实准确来说，ES中的查询操作分为2种: 查询(query)和过滤(filter)。查询即是之前提到的query查询，它 (查询)默认会计算每个返回文档的得分，然后根据得分排序。而过滤(filter)只会筛选出符合的文档，并不计算得分，而且它可以缓存文档。所以，单从性能考虑，过滤比查询更快。换句话说过滤适合在大范围筛选数据，而查询则适合精确匹配数据。一般应用时，应先使用过滤操作过滤数据，然后使用查询匹配数据。

GET /ems/emp/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "match_all": {
    
    }} //查询条件
      ],
      "filter": {
    
    ....} //过滤条件
  }
}

注意:

在执行 filter 和 query 时,先执行 filter 在执行 query
Elasticsearch会自动缓存经常使用的过滤器，以加快性能。

常见过滤类型有: term 、 terms 、ranage、exists、ids等filter。

term 、 terms Filter

GET /ems/emp/_search   # 使用term过滤
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    
          "name": {
    
    
            "value": "小黑"
          }
        }}
      ],
      "filter": {
    
    
        "term": {
    
    
          "content":"框架"
        }
      }
    }
  }
}
GET /dangdang/book/_search  #使用terms过滤
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    
          "name": {
    
    
            "value": "中国"
          }
        }}
      ],
      "filter": {
    
    
        "terms": {
    
    
          "content":[
              "科技",
              "声音"
            ]
        }
      }
    }
  }
}

ranage filter

GET /ems/emp/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    
          "name": {
    
    
            "value": "中国"
          }
        }}
      ],
      "filter": {
    
    
        "range": {
    
    
          "age": {
    
    
            "gte": 7,
            "lte": 20
          }
        }
      }
    }
  }
}

exists filter

#过滤存在指定字段,获取字段不为空的索引记录使用
GET /ems/emp/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    
          "name": {
    
    
            "value": "中国"
          }
        }}
      ],
      "filter": {
    
    
        "exists": {
    
    
          "field":"aaa"
        }
      }
    }
  }
}

ids filter

#过滤含有指定字段的索引记录
GET /ems/emp/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    
          "name": {
    
    
            "value": "中国"
          }
        }}
      ],
      "filter": {
    
    
        "ids": {
    
    
          "values": ["1","2","3"]
        }
      }
    }
  }
}

ElasticSearchOptions

引入依赖

<parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.5.5</version>
    </parent>
    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

配置客户端

import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.elasticsearch.client.ClientConfiguration;
import org.springframework.data.elasticsearch.client.RestClients;
import org.springframework.data.elasticsearch.config.AbstractElasticsearchConfiguration;

@Configuration
public class RestClientConfig extends AbstractElasticsearchConfiguration {
    
    

    @Value("${elasticsearch.host}")
    private String host;

    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() {
    
    
        final ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                .connectedTo(host)
                .build();
        return RestClients.create(clientConfiguration).rest();
    }

}

客户端对象

import lombok.Data;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;

@Document(indexName = "products", createIndex = true)
@Data
public class Product {
    
    
    @Id
    private String id;
    @Field(type = FieldType.Keyword)
    private String title;
    @Field(type = FieldType.Float)
    private Double price;
    @Field(type = FieldType.Text)
    private String description;

    /**
     *
     * 1. @Document(indexName = "products", createIndex = true) 用在类上 作用:代表一个对象为一个文档
     * 		indexName属性: 创建索引的名称
     *      createIndex属性: 是否创建索引
     * 2. @Id 用在属性上  作用:将对象id字段与ES中文档的_id对应
     * 3. @Field(type = FieldType.Keyword) 用在属性上 作用:用来描述属性在ES中存储类型以及分词情况
     *    type: 用来指定字段类型
     */

}

测试类

import com.entity.Product;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.query.Query;


@SpringBootTest
public class TestElasticSearchOptions{
    
    
    private  final ElasticsearchOperations elasticsearchOperations;
    @Autowired
    public TestElasticSearchOptions(ElasticsearchOperations elasticsearchOperations) {
    
    
        this.elasticsearchOperations = elasticsearchOperations;
    }


    //创建索引index&创建映射mapping 并索引一条文件
    //保存 & 更新 id 存在更新  id 不存在保存
    @Test
    public void testIndex(){
    
    
        Product product = new Product();
        product.setId("1");
        product.setTitle("iphone14");
        product.setPrice(6799.89);
        product.setDescription("iPhone 14屏幕采用6.1英寸OLED屏幕");
        elasticsearchOperations.save(product);
    }

    //删除文档
    @Test
    public void testDelete(){
    
    
        Product product = new Product();
        product.setId("1");
        elasticsearchOperations.delete(product);
    }

    //查询文档
    @Test
    public void testGet(){
    
    
        Product product = elasticsearchOperations.get("1", Product.class);
        System.out.println(product.getId());
        System.out.println(product.getPrice());
        System.out.println(product.getTitle());
        System.out.println(product.getDescription());
    }

    //更新文档
    @Test
    public void testUpdate() {
    
    
        Product product = new Product();
        product.setId("1");
        product.setTitle("iphone14");
        product.setPrice(6799.89);
        product.setDescription("iPhone 14屏幕采用6.1英寸OLED屏幕 更新啦");
        elasticsearchOperations.save(product);//不存在添加,存在更新
    }

    //删除所有
    @Test
    public void testDeleteAll() {
    
    
        elasticsearchOperations.delete(Query.findAll(), Product.class);
    }

    //查询所有
    @Test
    public void testFindAll() {
    
    
        SearchHits<Product> productSearchHits = elasticsearchOperations.search(Query.findAll(), Product.class);
        productSearchHits.forEach(productSearchHit -> {
    
    
            System.out.println("id: " + productSearchHit.getId());
            System.out.println("score: " + productSearchHit.getScore());
            Product product = productSearchHit.getContent();
            System.out.println("product: " + product);
        });
    }
}

RestHighLevelClient

RestHighLevelClient的API作为ElasticSearch备受推荐的客户端组件，其封装系统操作ES的方法，包括索引结构管理，数据增删改查管理，常用查询方法，并且可以结合原生ES查询原生语法，功能十分强大。

import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import javax.naming.directory.SearchResult;
import java.io.IOException;


/**
 * 索引和映射的操作
 */
@SpringBootTest
public class TestRestHighLevClient{
    
    

    private final RestHighLevelClient restHighLevelClient;
    @Autowired
    public TestRestHighLevClient(RestHighLevelClient restHighLevelClient) {
    
    
        this.restHighLevelClient = restHighLevelClient;
    }
    //创建索引并创建映射
    @Test
    public void testCreateIndexAndMapping() throws IOException {
    
    
        //参数 1: 创建索引名称
        CreateIndexRequest indexRequest = new CreateIndexRequest("goods");

        //创建映射
        //参数 1: source 代表映射 json 格式  参数 2: 代表数据格式类型 JSON
        indexRequest.mapping("{\n" +
                "    \"properties\": {\n" +
                "      \"id\":{\n" +
                "        \"type\": \"integer\"\n" +
                "      },\n" +
                "      \"title\":{\n" +
                "        \"type\": \"keyword\"  \n" +
                "      },\n" +
                "      \"price\":{\n" +
                "        \"type\": \"double\"\n" +
                "      },\n" +
                "      \"description\":{\n" +
                "        \"type\": \"text\",\n" +
                "        \"analyzer\": \"ik_max_word\"\n" +
                "      }\n" +
                "    }\n" +
                "  }", XContentType.JSON);

        //参数 1: 创建索引请求对象  参数 2: 请求默认配置对象
        CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(indexRequest, RequestOptions.DEFAULT);
        System.out.println("是否创建成功: "+createIndexResponse.isAcknowledged());
    }


    //索引文档
    @Test
    public void testIndex() throws IOException {
    
    
        IndexRequest indexRequest = new IndexRequest("goods");
        indexRequest.source("{\n" +
                "          \"id\" : 1,\n" +
                "          \"title\" : \"保温杯\",\n" +
                "          \"price\" : 123.23,\n" +
                "          \"description\" : \"nice！\"\n" +
                "        }",XContentType.JSON);
        IndexResponse index = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
        System.out.println(index.status());
    }

    //基于 id 查询文档
    @Test
    public void testGet() throws IOException {
    
    
        GetRequest getRequest = new GetRequest("goods","1");
        GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
        System.out.println(getResponse.getSourceAsString());
    }

    //删除索引
    @Test
    public void deleteIndex() throws IOException {
    
    
        AcknowledgedResponse acknowledgedResponse = restHighLevelClient.indices().delete(new DeleteIndexRequest("products"), RequestOptions.DEFAULT);
        System.out.println("是否删除成功: " + acknowledgedResponse.isAcknowledged());
    }

    //查询所有
    @Test
    public void testSearch() throws IOException {
    
    
        SearchResult searchRequest = new SearchRequest("goods");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(QueryBuilders.matchAllQuery());
        searchRequest.source(sourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        //System.out.println(searchResponse.getHits().getTotalHits().value);
        SearchHit[] hits = searchResponse.getHits().getHits();
        for (SearchHit hit : hits) {
    
    
            System.out.println(hit.getSourceAsString());
        }
    }

    //综合查询
    @Test
    public void testSearch() throws IOException {
    
    
        SearchRequest searchRequest = new SearchRequest("goods");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder
                .from(0)
                .size(2)
                .sort("price", SortOrder.DESC)
                .fetchSource(new String[]{
    
    "title"},new String[]{
    
    })
                .highlighter(new HighlightBuilder().field("description").requireFieldMatch(false).preTags("<span style='color:red;'>").postTags("</span>"))
                .query(QueryBuilders.termQuery("description","111"));
        searchRequest.source(sourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println("总条数: "+searchResponse.getHits().getTotalHits().value);
        SearchHit[] hits = searchResponse.getHits().getHits();
        for (SearchHit hit : hits) {
    
    
            System.out.println(hit.getSourceAsString());
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            highlightFields.forEach((k,v)-> System.out.println("key: "+k + " value: "+v.fragments()[0]));
        }
    }



}

SpringDataES

Spring Data ES是Spring对原生JAVA操作ES封装之后的产物，通过对原生API的封装，简化对ES的操作。

实体类创建

import lombok.Data;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;

@Document(indexName = "product",createIndex = true)
@Data
public class Products {
    
    

    @Id
    @Field(type = FieldType.Integer,store = true,index = true)
    private Integer id;

    @Field(type = FieldType.Integer,store = true,index = true,analyzer = "ik_max_word",searchAnalyzer = "ik_max_word")
    private String productName;

    @Field(type = FieldType.Integer,store = true,index = true,analyzer = "ik_max_word",searchAnalyzer = "ik_max_word")
    private String productDesc;

    /**
     @Document:标记在类上，标记实体类文档对象，一般有如下属性
        indexName:对应索引名称
        createIndex:是否自动创建索引

     @Id：标记在成员变量上，标记一个字段为主键，该字段会同步到ES该文档ID的值

     @Field：标记在成员变量上，标记为文档中的域，一般有如下属性：
         type：域的类型
         index：是否创建索引，默认是 true
         store：是否单独存储，默认是 false
         analyzer：分词器
         searchAnalyzer：搜索时的分词器
     */
}

Repository接口

import com.entity.Products;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

public interface ProductRepository extends ElasticsearchRepository<Products,Integer> {
    
    
}

测试

import com.entity.Product;
import com.entity.Products;
import com.repository.ProductRepository;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Sort;

import java.util.List;
import java.util.Optional;

@SpringBootTest
public class ProductRepositoryTest {
    
    

    @Autowired
    private ProductRepository productRepository;

    @Test
    public void addDocument(){
    
    
        Products products = new Products();
        products.setId(1);
        products.setProductName("iphone14");
        products.setProductDesc("iPhone 14屏幕采用6.1英寸OLED屏幕 更新啦");
        productRepository.save(products);
    }

    @Test
    public void updateDocument(){
    
    
        Products products = new Products(1, "iphone15", "iPhone 14屏幕采用6.1英寸OLED屏幕 更新啦");
        productRepository.save(products);
    }

    @Test
    public void findAllDocument(){
    
    
        Iterable<Products> all = productRepository.findAll();
        for (Products product : all) {
    
    
            System.out.println(product);
        }
    }

    @Test
    public void findDocumentById(){
    
    
        Optional<Products> product = productRepository.findById(1);
        System.out.println(product.get());
    }

    @Test
    public void deleteDocument(){
    
    
        productRepository.deleteById(1);
    }

    @Test
    public void testFindByProductDescMatch(){
    
    
        List<Product> list = productRepository.findByProductDescMatch("iphone");
        list.forEach(System.out::println);
    }


    @Test
    public void testFindByProductDescFuzzy(){
    
    
        List<Product> list = productRepository.findByProductDescFuzzy("elasticsearcha");
        list.forEach(System.out::println);
    }

    @Test
    public void testFindByProductName(){
    
    
        List<Product> list = productRepository.findByProductName("elasticsearch");
        list.forEach(System.out::println);
    }

    @Test
    public void testFindByProductNameOrProductDesc(){
    
    
        List<Product> list = productRepository.findByProductNameOrProductDesc("elasticsearch","手机");
        list.forEach(System.out::println);
    }

    @Test
    public void testFindByIdBetween(){
    
    
        List<Product> list = productRepository.findByIdBetween(1,3);
        list.forEach(System.out::println);
    }

    @Test
    public void testFindPage(){
    
    
        // 参数1：页数，参数2：每页条数
        PageRequest pageable = PageRequest.of(1, 3);
        Page<Products> page = productRepository.findAll(pageable);
        System.out.println("总条数"+page.getTotalElements());
        System.out.println("总页数"+page.getTotalPages());
        System.out.println("数据"+page.getContent());
    }

    @Test
    public void testFindPage2(){
    
    
        Sort sort = Sort.by(Sort.Direction.DESC,"id");
        PageRequest pageable = PageRequest.of(0, 2,sort);
        Page<Product> page = productRepository.findByProductDescMatch("iphone", pageable);
        System.out.println("总条数"+page.getTotalElements());
        System.out.println("总页数"+page.getTotalPages());
        System.out.println("数据"+page.getContent());
    }

    @Test
    public void testFindSort(){
    
    
        Sort sort = Sort.by(Sort.Direction.DESC, "id");
        Iterable<Products> all = productRepository.findAll(sort);
        for (Products product : all) {
    
    
            System.out.println(product);
        }
    }


}

Elasticsearch集群

一个集群就是由一个或多个节点组织在一起，它们共同持有你整个的数据，并一起提供索引和搜索功能。一个集群由一个唯一的名字标识，这个名字默认就是elasticsearch。这个名字是重要的，因为一个节点只能通过指定某个集群的名字，来加入这个集群。

集群名字概念

节点

一个节点是你集群中的一个服务器，作为集群的一部分，它存储你的数据，参与集群的索引和搜索功能。和集群类似，一个节点也是由一个名字来标识的，默认情况下，这个名字是一个随机的漫威漫画角色的名字，这个名字会在启动的时候赋予节点。

索引

一组相似文档的集合

映射

用来定义索引存储文档的结构如：字段、类型等。

文档

索引中一条记录,可以被索引的最小单元

分片

Elasticsearch提供了将索引划分成多份的能力，这些份就叫做分片。当你创建一个索引的时候，你可以指定你想要的分片的数量。每个分片本身也是一个功能完善并且独立的“索引”，这个“索引”可以被放置到集群中的任何节点上。

复制

Index的分片中一份或多份副本。

注：

分片的数量只能在索引创建时指定，索引创建后不能再更改分片数量，但可以改变副本的数量。
为保证节点发生故障后集群的正常运行，ES不会将某个分片和它的副本存在同一台节点上。

集群优化

水平扩容

关闭一个节点，可以发现ES集群可以自动进行故障应对。
重新打开该节点，可以发现ES集群可以自动进行水平扩容。
分片数不能改变，但是可以改变每个分片的副本数：

磁盘选择

ES的优化即通过调整参数使得读写性能更快磁盘通常是服务器的瓶颈。Elasticsearch重度使用磁盘，磁盘的效率越高，Elasticsearch的执行效率就越高。这里有一些优化磁盘的技巧：

使用SSD（固态硬盘），它比机械磁盘优秀多了。
使用RAID0模式（将连续的数据分散到多个硬盘存储，这样可以并行进行IO操作）,代价是一块硬盘发生故障就会引发系统故障。
不要使用远程挂载的存储。

分片策略

分片和副本数并不是越多越好。每个分片的底层都是一个Lucene索引，会消耗一定的系统资源。且搜索请求需要命中索引中的所有分片，分片数过多会降低搜索性能。索引的分片数需要架构师和技术人员对业务的增长有预先的判断，一般来说我们遵循以下原则：

每个分片占用的硬盘容量不超过ES的最大JVM的堆空间设置(一般设置不超过32G）。比如：如果索引的总容量在500G左右，那分片数量在16个左右即可。
分片数一般不超过节点数的3倍。比如：如果集群内有10个节点，则分片数不超过30个。
推迟分片分配：节点中断后集群会重新分配分片。但默认集群会等待一分钟来查看节点是否重新加入。我们可以设置等待的时长，减少重新分配的次数：
减少副本数量：进行写入操作时，需要把写入的数据都同步到副本，副本越多写入的效率就越慢。我们进行大批量进行写入操作时可以先设置副本数为0，写入完成后再修改回正常的状态。

内存设置

ES默认占用内存是4GB，我们可以修改config/jvm.option设置ES的堆内存大小，Xms表示堆内存的初始大小，Xmx表示可分配的最大内存。

Xmx和Xms的大小设置为相同的，可以减轻伸缩堆大小带来的压力。
Xmx和Xms不要超过物理内存的50%，因为ES内部的Lucene也要占据一部分物理内存。
Xmx和Xms不要超过32GB，由于Java语言的特性，堆内存超过32G会浪费大量系统资源，所以在内存足够的情况下，最终我们都会采用设置为31G：

例如：在一台128GB内存的机器中，我们可以创建两个节点，每个节点分配31GB内存