ES deep paging and search actual combat (based on ES7.x)

@Service
@Slf4j
public class DynamicSecurityScanServiceImpl implements DynamicSecurityScanService {

    @Qualifier("elasticsearchTemplate")
    @Autowired
    private ElasticsearchRestTemplate elasticsearchRestTemplate;

    @Qualifier("elasticsearchClient")
    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Autowired
    private DynamicSecurityScanDocRepository repository;
    /**
     *
     * @param overviewDto
     * @return Page 自定义分页对象
     * @throws IOException
     * bool包含四种操作符,分别是must,should,must_not,query filter查询查询对结果进行缓存
     * match分析器,模糊匹配  term精准  multi_match多个字段同时进行匹配
     * from to浅分页  scroll 深分页
     * int from  = (paramPI.getPageNum()-1)*paramPI.getPageSize();
     * sourceBuilder.from(from);
     */
    @Override
    public Page<DynamicSecurityScanDoc> searchSystemDoc(OverviewDto overviewDto) throws IOException {
        Page page = new Page(overviewDto.getCurPage(), overviewDto.getPageSize());
        final Scroll scroll = new Scroll(TimeValue.timeValueSeconds(60));
        // 1、创建search请求
        SearchRequest searchRequest = new SearchRequest(Constant.INDEXFORVERSIONDETAUL);
        searchRequest.scroll(scroll);
        // 2、用SearchSourceBuilder来构造查询请求体 ,请仔细查看它的方法,构造各种查询的方法都在这。
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder().trackTotalHits(true);
        sourceBuilder.size(overviewDto.getPageSize());
        //搜索条件
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();


        if (!StringUtils.isEmpty(overviewDto.getUser())) {
            MultiMatchQueryBuilder userQuery = QueryBuilders.multiMatchQuery(overviewDto.getUser(), "owner", "developer", "tester");
            boolQuery.must(userQuery);
        }

        if (!StringUtils.isEmpty(overviewDto.getAppName())) {
            MatchQueryBuilder appNameQuery = QueryBuilders.matchQuery("appName", overviewDto.getAppName());
            boolQuery.must(appNameQuery);
        }

        if (!StringUtils.isEmpty(overviewDto.getAppVersion())) {
            TermQueryBuilder appversionQuery = QueryBuilders.termQuery("appVersion.keyword", overviewDto.getAppVersion());
            boolQuery.must(appversionQuery);
        }
        if(!StringUtils.isEmpty(overviewDto.getBranchName())){
            MatchQueryBuilder branchNameQuery=QueryBuilders.matchQuery("branchName", overviewDto.getBranchName());
            boolQuery.must(branchNameQuery);
        }
        if(!StringUtils.isEmpty(overviewDto.getVulnerabilityResult())){
            TermQueryBuilder vulnerabilityResultQuery=QueryBuilders.termQuery("vulnerabilityResult.keyword", overviewDto.getVulnerabilityResult());
            boolQuery.must(vulnerabilityResultQuery);
        }
        if(!StringUtils.isEmpty(overviewDto.getType())){
            TermQueryBuilder typeQuery=QueryBuilders.termQuery("type.keyword", overviewDto.getType());
            boolQuery.must(typeQuery);
        }
        sourceBuilder.query(boolQuery);
        DslService.printDsl(sourceBuilder);
        //将请求体加入到请求中
        searchRequest.source(sourceBuilder);
        //3、发送请求
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        //处理搜索命中文档结果
        SearchHits hits = searchResponse.getHits();
        int totalHits = (int) hits.getTotalHits().value;
        List<DynamicSecurityScanDoc> list = new ArrayList<>();
        String scrollId = null;
        int pageNum = overviewDto.getCurPage();
        int count = 1;
        while (searchResponse.getHits().getHits().length != 0){
            if(count == pageNum){
                execute(hits, list);
                log.info("ES分页查询成功");
                break;
            }
            count++;
            //每次循环完后取得scrollId,用于记录下次将从这个游标开始取数
            scrollId = searchResponse.getScrollId();
            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
            scrollRequest.scroll(scroll);
            searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
        }
        if(scrollId != null){
            //清除滚屏
            ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
            //也可以选择setScrollIds()将多个scrollId一起使用
            clearScrollRequest.addScrollId(scrollId);
            restHighLevelClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);
        }
        page.setTotalCount(totalHits);
        page.setList(list);
        return page;
    }

ES paging

According to the general query process for size+from shallow paging, if I want to query the first 10 pieces of data: The client request is sent to a node and the node is forwarded to each shard, querying part of the first 10 pieces of data on each shard The result is returned to the node, the data is integrated, and the first 10 items are extracted and returned to the requesting client

GET sdl-overview/_search
{
  "from": 1, 
  "size": 20, 
  "query": {
    "wildcard": {
        "appName.keyword": {
          "value": "*0*"
        }
        
    }
  }
}
  • This kind of shallow paging is only suitable for a small amount of data, because the larger the from, the longer the query time; and the larger the data, the lower the efficiency index of the query.

  • Advantages: from+size is more efficient when the amount of data is not large.

  • Disadvantages: In the case of a very large amount of data, the from+size paging will load all records into the memory. This will not only run the express delivery very slowly, but also easily cause es to hang up due to insufficient memory.

scroll deep paging

If the requested number of pages is small (assuming 20 docs per page), Elasticsearch will not have any problems, but if the number of pages is large, such as requesting page 20, Elasticsearch has to take out all the pages from page 1 to page 20 docs, then remove the docs from pages 1 to 19, and get the docs on page 20.

The solution is to use scroll, scroll is to maintain a snapshot information of the current index segment-cache (this snapshot information is the snapshot when you execute this scroll query).

  • Initialization can divide scroll into two steps: initialization and traversal: 1. When initializing, cache all search results that meet the search criteria, which can be imagined as a snapshot. 2. When traversing, take data from this snapshot.

GET sdl-overview/_search?scroll=5m
{
  "size": 5, 
  "query": {
    "wildcard": {
        "appName.keyword": {
          "value": "*0*"
        }
        
    }
  }
}
  • When traversing, get the _scroll_id from the last traversal, then take the scroll parameter, repeat the last traversal step, know that the returned data is empty, it means the traversal is complete

Complex business query DSL (bool)

 

{
  "query": {
    "bool": {
      "must": [
         {"term":{"appVersion.keyword":"external-20200702-5009"}},
         {"match":{"appName":"葵花谱"}},
         {"term":{"type.keyword":"0"}},
         {"match":{"branchName":"release-1.0"}},
         {"term":{"vulnerabilityResult.keyword":"1"}},
         {"multi_match": {
            "query": "Gosaint3",
            "fields": ["developer","tester","owner"]
           }
        }
      ]
    }
  }
}

 

Guess you like

Origin blog.csdn.net/GoSaint/article/details/107183201