从零开发短视频电商电商商品全文搜索之Elasticsearch与SpringBoot集成实战

Elasticsearch部署请参考：【 https://laker.blog.csdn.net/article/details/120977484】

Elasticsearch官网文档：https://www.elastic.co/guide/en/elasticsearch/client/index.html

SpringBoot集成

我使用的版本为SpringBoot2.3.7RELEASE,对应的ES版本为7.6.2.

添加POM依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

SpringBoot版本以及对应Elasticsearch版本如下图：

以SpringBoot2.3.7RELEASE版本为例子。

Spring Data Elasticsearch从 4.0 版本起不推荐使用 ElasticsearchTemplate，请改用 ElasticsearchRestTemplate。

下表显示了Spring Data发布系列所使用的Elasticsearch版本和其中包含的Spring Data Elasticsearch版本，以及针对特定Spring Data发布系列的Spring Boot版本:

Spring Data Release Train	Spring Data Elasticsearch	Elasticsearch	Spring Framework	Spring Boot
2021.1 (Q)	4.3.x	7.15.2	5.3.x	2.5 .x
2021.0 (Pascal)	4.2.x	7.12.0	5.3.x	2.5.x
2020.0 (Ockham)	4.1.x	7.9.3	5.3.2	2.4.x
Neumann	4.0.x	7.6.2	5.2.12	2.3.x
Moore	3.2.x	6.8.12	5.2.12	2.2.x
Lovelace	3.1.x	6.2.2	5.1.19	2.1.x
Kay	3.0.x	5.5.0	5.0.13	2.0.x
Ingalls	2.1.x	2.4.0	4.3.25	1.5.x

配置文件

spring:
  elasticsearch:
    rest:
      # ES的连接地址，多个地址用逗号分隔
      uris: http://127.0.0.1:9200
      # 读取超时时间
      read-timeout: 30s
      # 连接超时时间
      connection-timeout: 1s
      # 用户名
      username:
      # 密码
      password:

注: 9300 是 Java 客户端的端口。9200 是支持 Restful HTTP 的接口。

创建映射对象

/**
 * 注解：@Document用来声明Java对象与ElasticSearch索引的关系
 * indexName 索引名称
 * shards    主分区数量，默认1
 * replicas  副本分区数量，默认1
 * createIndex 索引不存在时，是否自动创建索引，默认true
 */
@Data
@Document(indexName = "goods_index", shards = 1, replicas = 0, createIndex = true)
public class Goods {
    
    
    /**
     * id
     */
    @Id
    private Long id;
    /**
     * 商品编号,不可使用分词器查询，精准匹配 FieldType.Keyword
     */
    @Field(type = FieldType.Keyword)
    private String goodsCode;
    /**
     * 商品名称,可以使用分词器查询，模糊匹配
     */
    @Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
    private String goodsName;
    /**
     * 商品发布时间
     */
    @JsonFormat(timezone = "GMT+8", pattern = "yyyy-MM-dd HH:mm:ss")
    @Field(type = FieldType.Date, format = DateFormat.basic_date_time)
    private Date createDate;
    /**
     * 排名 默认都是100
     */
    @Field(type = FieldType.Integer)
    private Integer hotScore = 100;
    /**
     * 内嵌字段
     */
    @Field(type = FieldType.Nested)
    private List<GoodsAttr> attrs;
}

映射详情：https://docs.spring.io/spring-data/elasticsearch/docs/4.0.6.RELEASE/reference/html/#elasticsearch.mapping.meta-model

@Id 作用在成员变量，标记一个字段作为id主键
@Field 作用在成员变量，标记为文档的字段，并指定字段映射属性：
- type：字段类型，是枚举：FieldType，可以是text、long、short、date、integer、object等
  - text：存储数据时候，会自动分词，并生成索引
  - keyword：存储数据时候，不会分词建立索引
  - Numerical：数值类型，分两类
    - 基本数据类型：long、interger、short、byte、double、float、half_float
    - 浮点数的高精度类型：scaled_float
      - 需要指定一个精度因子，比如10或100。elasticsearch会把真实值乘以这个因子后存储，取出时再还原
  - Date：日期类型
    - elasticsearch可以对日期格式化为字符串存储，但是建议我们存储为毫秒值，存储为long，节省空间。
- index：是否索引，是否设置分词，布尔类型，默认是true
- store：是否存储，布尔类型，默认是false
- analyzer：分词器名称，存储时使用的分词器，这里的ik_max_word即使用ik分词器
- searchAnalyze：搜索时使用的分词器

创建Repository接口

@Repository
public interface GoodsRepository extends ElasticsearchRepository<Goods, Long> {
    
    
}

常见操作

我们在操作索引和数据时，需要引用下面的类

@Autowired 用于高级查询
private ElasticsearchRestTemplate elasticsearchRestTemplate;
@Autowired 用于基础的CRUD
private GoodsRepository goodsRepository;
@Autowired 
private RestHighLevelClient restHighLevelClient;
@Bean
RestHighLevelClient client() {
    
    
    ClientConfiguration clientConfiguration = ClientConfiguration.builder()
            .connectedTo("localhost:9200")
            .build();

    return RestClients.create(clientConfiguration)
            .rest();
}

索引操作

# 判断索引是否存在
elasticsearchRestTemplate.indexOps(Goods.class).exists();
elasticsearchRestTemplate.indexOps(IndexCoordinates.of(indexNames)).exists();
# 创建索引
elasticsearchRestTemplate.indexOps(Goods.class).create();
# 删除索引
elasticsearchRestTemplate.indexOps(IndexCoordinates.of(indexName)).delete();
if (!esService.indexExists()) {
    
    
    esService.indexCreate();
}

新增数据

Goods goods = new Goods();
goods.setId((long) i);
goods.setGoodsCode("中国人" + i);
goods.setCreateDate(new Date());
goods.setHotScore(10);
goods.setGoodsName("我是中国人" + i);
# 新增
goodsRepository.save(goods);
# 批量新增
goodsRepository.saveAll(goodsList);

删除数据

# 根据ID，删除数据
goodsRepository.deleteById(id);
# 根据对象删除数据，主键ID不能为空
goodsRepository.delete(bean);
# 根据对象集合，批量删除
goodsRepository.deleteAll(beanList);
# 删除所有
goodsRepository.deleteAll();

更新数据

Goods goods = new Goods();
goods.setId(id); // ID不能为空
goods.setGoodsCode("中国人");
goodsRepository.save(goods);// 根据主键更新 ID不能为空

更新某几个字段

查询

ElasticsearchRestTemplate的基本api

SearchQuery 总的查询
BoolQueryBuilder bool查询,可在后面加上must,mustNot,should等等
MatchQueryBuilder 匹配查询
TermQueryBuilder 倒排索引查询
HighlightBuilder 高亮查询,用于设置要高亮的field

查询结果

SearchHit

包含以下信息:

Id
Score
Sort Values
Highlight fields
The retrieved entity of type

SearchHits

包含以下信息:

Number of total hits
Total hits relation
Maximum score
A list of SearchHit<T> objects
Returned aggregations

SearchPage

定义一个Spring Data Page 包含一个 SearchHits<T> 元素，可以使用存储库方法进行分页访问

SearchScrollHits

由ElasticsearchRestTemplate中的底层滚动API函数返回，它用Elasticsearch滚动id充实了SearchHits。

SearchHitsIterator

由SearchOperations接口的流函数返回的迭代器。

QueryBuilders提供了大量的静态方法，用于生成各种不同类型的查询：

matchQuery：词条匹配，先分词然后在调用termQuery进行匹配
TermQuery：词条匹配，不分词
wildcardQuery：通配符匹配
fuzzyQuery：模糊匹配
rangeQuery：范围匹配
booleanQuery：布尔查询,组合查询

// 构建查询条件
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
// 添加基本分词查询
queryBuilder.withQuery(QueryBuilders.matchQuery(“title”, “小米手机”));

BooleanQuery（组合查询）

加粗样式

注意点：

BooleanClause用于表示布尔查询子句关系的类，包括：BooleanClause.Occur.MUST，BooleanClause.Occur.MUST_NOT，BooleanClause.Occur.SHOULD。必须包含,不能包含,可以包含三种.有以下6种组合：

1．MUST和MUST：交集。

2．MUST和MUST_NOT：表示查询结果中不能包含MUST_NOT所对应得查询子句的检索结果。

3．SHOULD与MUST_NOT：连用时，功能同MUST和MUST_NOT。

4．SHOULD与MUST连用时，结果为MUST子句的检索结果,但是SHOULD可影响排序。

5．SHOULD与SHOULD：并集。

6．MUST_NOT和MUST_NOT：无意义，检索无结果。

public void testBooleanQuery(){
    
    
    NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder();
    builder.withQuery(QueryBuilders.boolQuery().must(QueryBuilders.termQuery("title","手机"))
                                               .must(QueryBuilders.termQuery("brand","小米"))
                     );
    Page<Item> list = this.itemRepository.search(builder.build());

排序

   NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
    // 添加基本分词查询
    queryBuilder.withQuery...
    // 排序
    queryBuilder.withSort(SortBuilders.fieldSort("price").order(SortOrder.ASC));

普通查询

可以使用goodsRepository操作

@Repository
public interface GoodsRepository extends ElasticsearchRepository<Goods, Long> {
    
    

    long countByGoodsCode(String goodsCode);

    long deleteByGoodsCode(String goodsCode);

    List<Goods> removeByGoodsCode(String goodsCode);

    List<Goods> findByGoodsCode(String goodsCode);

    Page<Goods> findByGoodsCode(String goodsCode, Pageable pageable);

    Slice<Goods> findByGoodsCode(String goodsCode, Pageable pageable);

    List<Goods> findByGoodsCode(String goodsCode, Sort sort);

    List<Goods> findFirst10ByGoodsCode(String goodsCode, Pageable pageable);
    
    @Query("{\"bool\": {\"must\": [{\"match\": {\"tags\": \"?0\"}}]}}")
    Page<Goods> findFirst10ByGoodsCode(String tag, Pageable pageable);
    
    Page<Article> findByAuthorsName(String name, Pageable pageable);

    @Query("{\"bool\": {\"must\": [{\"match\": {\"authors.name\": \"?0\"}}]}}")
    Page<Article> findByAuthorsNameUsingCustomQuery(String name, Pageable pageable);

    @Query("{\"bool\": {\"must\": {\"match_all\": {}}, \"filter\": {\"term\": {\"tags\": \"?0\" }}}}")
    Page<Article> findByFilteredTagQuery(String tag, Pageable pageable);

    @Query("{\"bool\": {\"must\": {\"match\": {\"authors.name\": \"?0\"}}, \"filter\": {\"term\": {\"tags\": \"?1\" }}}}")
    Page<Article> findByAuthorsNameAndFilteredTagQuery(String name, String tag, Pageable pageable);
}

API 接受Sort和Pageable。如果您不想应用任何排序或分页，请使用Sort.unsorted()和Pageable.unpaged()，不要传null。

Sort sort = Sort.by(“firstname”).ascending()

.and(Sort.by(“lastname”).descending());

与Elasticsearch 查询字符串映射如下：https://docs.spring.io/spring-data/elasticsearch/docs/4.0.6.RELEASE/reference/html/#elasticsearch.query-methods.criterions

分页查询

Page<User> users = repository.findAll(PageRequest.of(1, 20));
MatchQueryBuilder builder = QueryBuilders.matchQuery(field, value);
SearchQuery searchQuery = new NativeSearchQuery(builder).setPageable(PageRequest.of(0, 100));
AggregatedPage<EmployeeBean> page = restTemplate.queryForPage(searchQuery, EmployeeBean.class);
long totalElements = page.getTotalElements(); // 总记录数
int totalPages = page.getTotalPages();  // 总页数
int pageNumber = page.getPageable().getPageNumber(); // 当前页号
List<EmployeeBean> beanList = page.toList();  // 当前页数据集
Set<EmployeeBean> beanSet = page.toSet();  // 当前页数据集

高亮查询

权重查询

聚合查询

监控管理

ElasticHQ可以作为Docker容器使用。执行以下命令以使用ElasticHQ启动容器：

$ docker run -d --name elastichq -p 5000:5000 elastichq/elasticsearch-hq

问题总结

索引映射创建

方式一 在实体类上加**@Document**，而在这个类中有一个createIndex属性，默认为true，意思是在启动应用时es中还没创建该索引，则进行初始化。

下面是默认创建的索引和映射，很明显里面的映射创建是不对。不建议使用这种方式。

es服务端版本为7.3.0。不知道是不是不完全匹配的原因。试了7.6.2版本也不对哈哈哈。

{
    
    
	"state": "open",
	"settings": {
    
    
		"index": {
    
    
			"refresh_interval": "1s",
			"number_of_shards": "1",
			"provided_name": "goods_index",
			"creation_date": "1640695057235",
			"store": {
    
    
				"type": "fs"
			},
			"number_of_replicas": "0",
			"uuid": "La3XlrTTTu6-SoNtsuPl1g",
			"version": {
    
    
				"created": "7030099"
			}
		}
	},
	"mappings": {
    
    
		"_doc": {
    
    
			"properties": {
    
    
				"hotScore": {
    
    
					"type": "long"
				},
				"_class": {
    
    
					"type": "text",
					"fields": {
    
    
						"keyword": {
    
    
							"ignore_above": 256,
							"type": "keyword"
						}
					}
				},
				"goodsCode": {
    
    
					"type": "text",
					"fields": {
    
    
						"keyword": {
    
    
							"ignore_above": 256,
							"type": "keyword"
						}
					}
				},
				"id": {
    
    
					"type": "long"
				},
				"goodsName": {
    
    
					"type": "text",
					"fields": {
    
    
						"keyword": {
    
    
							"ignore_above": 256,
							"type": "keyword"
						}
					}
				},
				"createDate": {
    
    
					"type": "text",
					"fields": {
    
    
						"keyword": {
    
    
							"ignore_above": 256,
							"type": "keyword"
						}
					}
				}
			}
		}
	}

方式二 使用postman、elasticsearch-head等提交put请求创建索引和映射。

这个就不用讲了哈，最原始的了。

方式三 设置**@Document**，的createIndex属性为false，然后使用代码初始化索引和映射。

 public boolean indexAndMappingCreate() {
    
    
        IndexOperations indexOperations = elasticsearchRestTemplate.indexOps(Goods.class);
        boolean exists = indexOperations.exists();
        if (exists) {
    
    
            return true;
        }
        boolean index = indexOperations.create();
        Document mapping = indexOperations.createMapping(Goods.class);
        boolean putMapping = indexOperations.putMapping(mapping);
        if (index && putMapping) {
    
    
            return true;
        } else {
    
    
            return false;
        }
    }

不支持fielddata = true

@Field(type = FieldType.Integer, fielddata = true)
private Integer hotScore = 100;

结果如下：

{
    
    
	"state": "open",
	"settings": {
    
    
		"index": {
    
    
			"refresh_interval": "1s",
			"number_of_shards": "1",
			"provided_name": "goods_index",
			"creation_date": "1640697770606",
			"store": {
    
    
				"type": "fs"
			},
			"number_of_replicas": "0",
			"uuid": "_N7I0GAJRsitjTbs9XnOcQ",
			"version": {
    
    
				"created": "7030099"
			}
		}
	},
	"mappings": {
    
    
		"_doc": {
    
    
			"properties": {
    
    
				"hotScore": {
    
    
					"type": "integer"
				},
				"goodsCode": {
    
    
					"type": "keyword"
				},
				"goodsName": {
    
    
					"search_analyzer": "ik_smart",
					"analyzer": "ik_max_word",
					"type": "text"
				},
				"attrs": {
    
    
					"type": "nested"
				},
				"createDate": {
    
    
					"format": "basic_date_time",
					"type": "date"
				}
			}
		}
	},

方式四 使用RestHighLevelClient发起请求，类似于postman发起put请求。

@Autowired
@Qualifier("elasticsearchClient")
protected RestHighLevelClient client;

方式五

排序查询报错

排序字段【goodsCode】需要设置fielddata=true. 需要对 string类型的字段，单独设置加载到内存中，才能排序。

Fielddata is disabled on text fields by default. Set fielddata=true on [goodsCode] in order to load fielddata in memory by uninverting the inverted index

查询时错误如下：

Caused by: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Fielddata is disabled on text fields by default. Set fielddata=true on [goodsCode] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]]
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
	... 46 more

参考：

https://www.baeldung.com/spring-data-elasticsearch-queries
https://blog.csdn.net/weixin_43814195/article/details/85281287
https://www.cnblogs.com/huanshilang/p/14382279.html
https://github.com/eugenp/tutorials/tree/master/persistence-modules/spring-data-elasticsearch