分布式搜索引擎-ElasticSearch(下集)

个人简介

作者是一个来自河源的大三在校生,以下笔记都是作者自学之路的一些浅薄经验,如有错误请指正,将来会不断的完善笔记,帮助更多的Java爱好者入门。

分布式搜索引擎-ElasticSearch(下集)

  • 注意:ElasticSearch版本为7.6.1

什么是ElasticSearch

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索引擎。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。

我们建立一个网站或应用程序,并要添加搜索功能,但是想要完成搜索工作的创建是非常困难的。我们希望搜索解决方案要运行速度快,我们希望能有一个零配置和一个完全免费的搜索模式,我们希望能够简单地使用JSON通过HTTP来索引数据,我们希望我们的搜索服务器始终可用,我们希望能够从一台开始并扩展到数百台,我们要实时搜索,我们要简单的多租户,我们希望建立一个云的解决方案。因此我们利用Elasticsearch来解决所有这些问题及可能出现的更多其它问题。摘选自《百度百科》

分页

GET goods/_search
{
   "query": {
     
     "match_all": {}
     
   }
   , "sort": [
     {
       "od": {
         "order": "desc"
       }
     }
   ]
 , "from" : 0
   , "size": 2
}
{
    
    
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
    
    
          "title" : "IQOONEO5",
          "content" : "IQOONEO5 高通骁龙870Soc ,",
          "price" : "2499",
          "od" : 4
        },
        "sort" : [
          4
        ]
      },
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
    
    
          "title" : "小米11",
          "content" : "小米11 高通骁龙888Soc ,1亿像素",
          "price" : "4500",
          "od" : 3
        },
        "sort" : [
          3
        ]
      }
    ]
  }
}

字段高亮(highlight)

可以选择一个或者多个字段高亮,然后被选择的这些字段如果被条件匹配到则会默认加em标签

GET goods/_search
{
   "query": {
     
     "match": {
       "title": "华为P40"
     }
     
   },
   "highlight": {
     
     "fields": {
       
       "title": {}
       
     }
     
   }
   
}

结果

{
    
    
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.7309713,
    "hits" : [
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.7309713,
        "_source" : {
    
    
          "title" : "华为P40",
          "content" : "华为P40 8+256G,麒麟990Soc,贼牛逼",
          "price" : "4999",
          "od" : 1
        },
        "highlight" : {
    
    
          "title" : [
            "<em>华</em><em>为</em><em>P40</em>"
          ]
        }
      },
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.5241971,
        "_source" : {
    
    
          "title" : "华为Mate30",
          "content" : "华为Mate30 8+128G,麒麟990Soc",
          "price" : "3998",
          "od" : 2
        },
        "highlight" : {
    
    
          "title" : [
            "<em>华</em><em>为</em>Mate30"
          ]
        }
      }
    ]
  }
}

默认是em标签,我们可以更改他的前缀和后缀,利用前端的知识

GET goods/_search
{
   "query": {
     
     "match": {
       "title": "华为P40"
     }
     
   },
   "highlight": {
     "pre_tags": "<span style='color: red'>",
     "post_tags": "</span>" ,
     "fields": {
       
       "title": {}
       
     }
     
   }
   
}
{
    
    
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.7309713,
    "hits" : [
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.7309713,
        "_source" : {
    
    
          "title" : "华为P40",
          "content" : "华为P40 8+256G,麒麟990Soc,贼牛逼",
          "price" : "4999",
          "od" : 1
        },
        "highlight" : {
    
    
          "title" : [
            "<span style='color: red'>华</span><span style='color: red'>为</span><span style='color: red'>P40</span>"
          ]
        }
      },
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.5241971,
        "_source" : {
    
    
          "title" : "华为Mate30",
          "content" : "华为Mate30 8+128G,麒麟990Soc",
          "price" : "3998",
          "od" : 2
        },
        "highlight" : {
    
    
          "title" : [
            "<span style='color: red'>华</span><span style='color: red'>为</span>Mate30"
          ]
        }
      }
    ]
  }
}

模仿百度搜索高亮

image-20210417165852357

例如百度搜索华为P40,不仅仅是title会高亮,content也会高亮,所以我们可以用multi_match+highlight实现

GET goods/_search
{
  "query": {
   
   "multi_match": {
     "query": "华为P40",
     "fields": ["title","content"]
   }
  }
  
  , "highlight": {
    "pre_tags": "<span style='color: red'>",
    "post_tags": "</span>", 
    "fields": {
      
      "title": {},
      "content": {}
    }
    
  }
  
  
  
}
{
    
    
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.8157697,
    "hits" : [
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.8157697,
        "_source" : {
    
    
          "title" : "华为P40",
          "content" : "华为P40 8+256G,麒麟990Soc,贼牛逼",
          "price" : "4999",
          "od" : 1
        },
        "highlight" : {
    
    
          "title" : [
            "<span style='color: red'>华</span><span style='color: red'>为</span><span style='color: red'>P40</span>"
          ],
          "content" : [
            "<span style='color: red'>华</span><span style='color: red'>为</span><span style='color: red'>P40</span> 8+256G,麒麟990Soc,贼牛逼"
          ]
        }
      },
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.8023796,
        "_source" : {
    
    
          "title" : "华为Mate30",
          "content" : "华为Mate30 8+128G,麒麟990Soc",
          "price" : "3998",
          "od" : 2
        },
        "highlight" : {
    
    
          "title" : [
            "<span style='color: red'>华</span><span style='color: red'>为</span>Mate30"
          ],
          "content" : [
            "<span style='color: red'>华</span><span style='color: red'>为</span>Mate30 8+128G,麒麟990Soc"
          ]
        }
      }
    ]
  }
}

bool查询(用作于多条件查询)

类似于MYSQL的and or

重点:must 代表and ,should 代表 or

must(and)的使用:

下面我们在must里面给了两个条件,如果这里是must,那就必须两个条件都要满足

GET goods/_search
{
  
  "query": {
     
      "bool": {
        
        "must": [
          {
          "match": {
            "title": "华为"
          }
          },
          {
            "match": {
              "content": "MATE30"
            }
          }  
        ] 
        
      }
  }
}

结果:

{
    
    
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.9512205,
    "hits" : [
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.9512205,
        "_source" : {
    
    
          "title" : "华为Mate30",
          "content" : "华为Mate30 8+128G,麒麟990Soc",
          "price" : "3998",
          "od" : 2
        }
      }
    ]
  }
}

should(or)的使用:

should里面同样有两个条件,但是只要满足一个就可以了

GET goods/_search
{
  
  "query": {
     
      "bool": {
        
        "should": [
          {
          "match": {
            "title": "华为"
          }
          },
          {
            "match": {
              "content": "MATE30"
            }
          }
          
          
        ] 
        
      }
  }
}

结果:

{
    
    
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.9512205,
    "hits" : [
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.9512205,
        "_source" : {
    
    
          "title" : "华为Mate30",
          "content" : "华为Mate30 8+128G,麒麟990Soc",
          "price" : "3998",
          "od" : 2
        }
      },
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.5241971,
        "_source" : {
    
    
          "title" : "华为P40",
          "content" : "华为P40 8+256G,麒麟990Soc,贼牛逼",
          "price" : "4999",
          "od" : 1
        }
      }
    ]
  }
}

过滤器,区间条件(filter range)

比如我们要实现,输入title=xx,我们如果想得到price>4000作为一个条件,可以用到这个。

GET goods/_search
{
  
  "query": {
     
      "bool": {
        
        "must": [
          {
          "match": {
            "title": "小米"
          }  
          }
        ],"filter": {
          "range": {
            "price": {
              "gt": 4000
            }
          }
        }
      }
  }
}
{
    
    
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.4135482,
    "hits" : [
      {
    
    
        "_index" : "goods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.4135482,
        "_source" : {
    
    
          "title" : "小米11",
          "content" : "小米11 高通骁龙888Soc ,1亿像素",
          "price" : "4500",
          "od" : 3
        }
      }
    ]
  }
}

查看整个es的索引信息

GET _cat/indices?v

elasticsearch的Java Api

准备阶段

1.导入elasticsearch高级客户端依赖和elasticsearch依赖(注意版本要和本机的es版本一致),我们本机现在用的是7.6.1的es

 <!--        导入java elastic 两个依赖-->
       <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
<!--           这个版本要和你本机的elasticsearch版本一致-->
            <version>7.6.1</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
<!--            这个版本要和你本机的elasticsearch版本一致-->
            <version>7.6.1</version>
        </dependency>
<!--        引入fastjson-->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.75</version>
        </dependency>

2.打开RestHighLevelClient的构造器:

public RestHighLevelClient(RestClientBuilder restClientBuilder) {
    
    
        this(restClientBuilder, Collections.emptyList());
    }

我们发现需要传入一个RestClientBuilder,但是这个对象我们需要通过RestClient来得到,而不是RestClientBuilder

3.打开RestClient:

 public static RestClientBuilder builder(HttpHost... hosts) {
        if (hosts == null || hosts.length == 0) {
            throw new IllegalArgumentException("hosts must not be null nor empty");
        }
        List<Node> nodes = Arrays.stream(hosts).map(Node::new).collect(Collectors.toList());
        return new RestClientBuilder(nodes);
    }

我们发现RestClient的builder可以得到RestClientBuilder,然后我们点进去看HttpHost:

public HttpHost(String hostname, int port, String scheme) {
    
     //es所在主机名,es的端口号,协议(默认http)
        this.hostname = (String)Args.containsNoBlanks(hostname, "Host name");
        this.lcHostname = hostname.toLowerCase(Locale.ROOT);
        if (scheme != null) {
    
    
            this.schemeName = scheme.toLowerCase(Locale.ROOT);
        } else {
    
    
            this.schemeName = "http";
        }

        this.port = port;
        this.address = null;
    }

4.然后我们就配置好了如下:

		HttpHost httpHost = new HttpHost("localhost",9200,"http");
        RestClientBuilder restClientBuilder = RestClient.builder(httpHost);
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(restClientBuilder);
        

5.为了方便,我们可以把这个RestHighLevelClient交给SpringIOC容器管理,后面我们自动注入即可

@Configuration
public class esConfig {
    
    


    @Bean
    public RestHighLevelClient restHighLevelClient(){
    
    
        HttpHost httpHost = new HttpHost("localhost",9200,"http");
        RestClientBuilder builder = RestClient.builder(httpHost);
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);
        return restHighLevelClient;
    }
 
}

索引操作

java elasticsearch api操作索引都是用restHighLevelClient.indices().xxxxx()的格式

创建索引
//创建索引
    @Test
    public void createIndex() throws IOException {
    
    
        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);
        //new一个创建索引请求,并传入一个创建的索引名称
        CreateIndexRequest createIndexRequest = new CreateIndexRequest("java01");
        //向es发送创建索引请求。
        CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT);

        restHighLevelClient.close();

    }
删除索引
//删除索引
    @Test
    public void deleteIndex() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        //new一个删除索引请求,并传入需要删除的索引名称
        DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("java01");
        //resthighLevelClient发送删除索引请求
        restHighLevelClient.indices().delete(deleteIndexRequest,RequestOptions.DEFAULT);
        restHighLevelClient.close();

    }
检查索引是否存在
//检查索引是否存在
    @Test
    public void indexExsit() throws IOException {
    
    
        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        GetIndexRequest getIndexRequest = new GetIndexRequest("goods");


        boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);

        System.out.println(exists);


    }

文档操作

创建指定id的文档
//创建文档
    @Test
    public void createIndexDoc() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        IndexRequest indexRequest = new IndexRequest("hello");
        //指定文档id
        indexRequest.id("1");
        /**
         *  public IndexRequest source(Map<String, ?> source, XContentType contentType) throws ElasticsearchGenerationException {
         *         try {
         *             XContentBuilder builder = XContentFactory.contentBuilder(contentType);
         *             builder.map(source);
         *             return this.source(builder);
         *         } catch (IOException var4) {
         *             throw new ElasticsearchGenerationException("Failed to generate [" + source + "]", var4);
         *         }
         *     }
         *     source有很多种方法,哪种都可以,我现在选的是Map的方法添加key:value
         */
        Map<String,Object> source=new HashMap<>();
        source.put("a_age","50");
        source.put("a_address","广州");
        //在es里面,一切皆为JSON,我们要把Map用fastjson转换成JSON字符串,XContentType指定为JSON类型
        indexRequest.source(JSON.toJSONString(source), XContentType.JSON);

        IndexResponse response = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);

        System.out.println("response:"+response);
        System.out.println("status:"+response.status());

    }
删除指定id的文档
  //删除文档
    @Test
    public void deleteDoc() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));

        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);


        DeleteRequest deleteRequest = new DeleteRequest("hello");

        deleteRequest.id("1");

        DeleteResponse delete = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);
        System.out.println(delete.status());

    }
修改指定id的文档
//修改文档
    @Test
    public void updateDoc() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        /**
         * 通过下面的方法去调用
         *     public UpdateRequest(String index, String id) {
         *         super(index);
         *         this.refreshPolicy = RefreshPolicy.NONE;
         *         this.waitForActiveShards = ActiveShardCount.DEFAULT;
         *         this.scriptedUpsert = false;
         *         this.docAsUpsert = false;
         *         this.detectNoop = true;
         *         this.id = id;
         *     }
         */
        UpdateRequest updateRequest = new UpdateRequest("hello","1");

        Map<String,Object> source=new HashMap<>();
        source.put("a_address","河源");
        updateRequest.doc(JSON.toJSONString(source),XContentType.JSON);
        UpdateResponse response = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
        System.out.println(response.status());
    }
获取指定id的文档
 //获取文档
    @Test
    public void getDoc() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);


        GetRequest getRequest = new GetRequest("hello");
        getRequest.id("1");

        GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);

        String sourceAsString = response.getSourceAsString();
        System.out.println(sourceAsString);

    }
搜索(匹配全文match_all)
//搜索(匹配全文match_all)
    @Test
    public void search_matchAll() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        /**
         *  public SearchRequest(String... indices) {
         *         this(indices, new SearchSourceBuilder());
         *     }
         */
        SearchRequest searchRequest = new SearchRequest("hello");

        //相当于文本
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
        searchSourceBuilder.query(matchAllQueryBuilder); //相当于search的query

        searchRequest.source(searchSourceBuilder);




        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        SearchHit[] hits = search.getHits().getHits();

 
        for (SearchHit hit : hits) {
    
    
            System.out.println(hit.getSourceAsString());
        }
    }
搜索(模糊查询match)
//模糊搜索match
    @Test
    public void search_match() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));

        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        SearchRequest searchRequest = new SearchRequest();

        //查询文本
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("a_address", "广州");
        searchSourceBuilder.query(matchQueryBuilder);

        searchRequest.source(searchSourceBuilder);

        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        SearchHit[] hits = search.getHits().getHits();
 
        for (SearchHit hit : hits) {
    
    
            System.out.println(hit.getSourceAsString());
        }

    }
搜索(多字段搜索multi_match)
 //搜索(多字段搜索multi_match)
    @Test
    public void  search_term() throws IOException {
    
    
        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        SearchRequest searchRequest = new SearchRequest("goods");

        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(QueryBuilders.multiMatchQuery("华为","title","content"));

        searchRequest.source(searchSourceBuilder);


        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        SearchHit[] hits = search.getHits().getHits();
        for (SearchHit hit : hits) {
    
    
            System.out.println(hit.getSourceAsString());
        }

    }
搜索(筛选字段fetchSource)

fetchsource方法相当于_source

//fetchsource实现筛选字段(_source)
    @Test
    public void search_source() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));

        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        SearchRequest searchRequest = new SearchRequest("goods");

        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(QueryBuilders.matchAllQuery());
        /**
         * public SearchSourceBuilder fetchSource(@Nullable String[] includes, @Nullable String[] excludes) {
         *         FetchSourceContext fetchSourceContext = this.fetchSourceContext != null ? this.fetchSourceContext : FetchSourceContext.FETCH_SOURCE;
         *         this.fetchSourceContext = new FetchSourceContext(fetchSourceContext.fetchSource(), includes, excludes);
         *         return this;
         *     }
         *
         */
        String[] includes={
    
    "title"}; //包含
        String[] excludes={
    
    }; //排除
        searchSourceBuilder.fetchSource(includes,excludes);

        searchRequest.source(searchSourceBuilder);

        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        SearchHit[] hits = search.getHits().getHits();
        for (SearchHit hit : hits) {
    
    
            System.out.println(hit.getSourceAsString());
        }


    }
分页、排序、字段高亮

我们要把下面的es命令行代码转换成Java代码

GET goods/_search
{
  
  "query": {
    
      "match": {
        "title": "华为"
      }
      
    
  },"sort": [
    {
      "od": {
        "order": "desc"
      }
    }
  ]
  
  ,"from": 0,
  "size": 1,
  "highlight": {
    "pre_tags": "<span style='color:red'>",
    "post_tags": "</span>", 
    "fields": {
      
      "title": {}
    }
    
  } 
}

Java 实现

//分页,排序,字段高亮
    @Test
    public void page_sort_HighLight() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        SearchRequest searchRequest = new SearchRequest("goods");

        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("title", "华为");


        searchSourceBuilder.query(matchQueryBuilder);

        //分页====
        searchSourceBuilder.from(0);
        searchSourceBuilder.size(1);
        //=======

        //排序
        searchSourceBuilder.sort("od", SortOrder.DESC);


        //字段高亮
        //=========高亮开始==
        HighlightBuilder highlightBuilder = new HighlightBuilder();

        //构建高亮的前缀后缀标签pre_tag和post_tag
        highlightBuilder.preTags("<span style='color:blue'>");
        highlightBuilder.postTags("</span>");

        //highlightBuilder.field()方法我们用一个String类型的
        /**
         * public HighlightBuilder field(String name) {
         *         return this.field(new HighlightBuilder.Field(name));
         *     }
         */
        highlightBuilder.field("title");
        //如果还需要更多字段高亮,则多写一遍field方法
//        highlightBuilder.field(); //第二个字段高亮
//        highlightBuilder.field(); //第三个字段高亮 。。。。。以此类推

        searchSourceBuilder.highlighter(highlightBuilder);

        //====================高亮结束



        searchRequest.source(searchSourceBuilder);

        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        SearchHit[] hits = search.getHits().getHits(); //hits里面封装了命中的所有数据
        for (SearchHit hit : hits) {
    
    
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            System.out.println("highlightMap:"+highlightFields);
            //通过title这个key去获取fragments
            //fragment里面是高亮之后的字段内容(很重要,可以用来覆盖原来没高亮的字段内容) <span style='color:blue'>华</span><span style='color:blue'>为</span>Mate30
            System.out.println("fragments:"+Arrays.toString(highlightFields.get("title").getFragments()));
        }


        restHighLevelClient.close();


    }
布尔搜索(bool)

实现类似如下es代码:

GET goods/_search
{
  "query": {
    
    "bool": {
      
      "should": [
        {
         
         "term": {
           "title": {
             "value": "华"
           }
         }
          
        },
        {
          
          "term": {
            "title": {
              "value": "米"
            }
          }
          
        }
      ]
      
    }
    
  }

}

Java实现:

 //布尔搜索(bool)
    @Test
    public void search_bool() throws IOException {
    
    

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder);

        SearchRequest searchRequest = new SearchRequest("goods");

        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //通过searchSourceBuilder对象构建bool查询对象
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        //这里should只能写一个,如should里面有多个条件,可以写多个should
        /**
         *
         *  "should": [
         *         {
         *
         *          "term": {
         *            "title": {
         *              "value": "华"
         *            }
         *          }
         *
         *         },
         *         {
         *
         *           "term": {
         *             "title": {
         *               "value": "米"
         *             }
         *           }
         */
        //例如上面should有两个条件,我们就要写两个should
        boolQueryBuilder.should(QueryBuilders.termQuery("title","华"));
        boolQueryBuilder.should(QueryBuilders.termQuery("title","米"));
        searchSourceBuilder.query(boolQueryBuilder);
        

        searchRequest.source(searchSourceBuilder);


        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        SearchHit[] hits = search.getHits().getHits();
        for (SearchHit hit : hits) {
    
    
            System.out.println(hit.getSourceAsString());
        }


        restHighLevelClient.close();

    }

es实战(京东商品搜索)

从京东上爬取数据

1:导入依赖:

	<!--        jsoup-->
         <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.12.1</version>
        </dependency>

2.创建实体类:

public class goods{
    
    

    private String img; //商品图片
    private String price; //商品价格
    private String title; //商品标题

    public goods() {
    
    
    }

    public goods(String img, String price, String title) {
    
    
        this.img = img;
        this.price = price;
        this.title = title;
    }

    public String getImg() {
    
    
        return img;
    }

    public void setImg(String img) {
    
    
        this.img = img;
    }

    public String getPrice() {
    
    
        return price;
    }

    public void setPrice(String price) {
    
    
        this.price = price;
    }

    public String getTitle() {
    
    
        return title;
    }

    public void setTitle(String title) {
    
    
        this.title = title;
    }

    @Override
    public String toString() {
    
    
        return "goods{" +
                "img='" + img + '\'' +
                ", price='" + price + '\'' +
                ", title='" + title + '\'' +
                '}';
    }
}

3.利用jsoup解析爬取京东商城搜索(核心),编写工具类:

@Component
public class jsoupUtils {
    
    


    private static RestHighLevelClient restHighLevelClient;

    @Autowired
    public  void setRestHighLevelClient(RestHighLevelClient restHighLevelClient) {
    
    
        jsoupUtils.restHighLevelClient = restHighLevelClient;
    }

    /**
     *封装了京东搜索功能,把搜索的数据添加进es中
     */
    public static void searchData_JD(String keyword) {
    
    

        BulkRequest bulkRequest = new BulkRequest();

        try {
    
    
            URL url = null;
            try {
    
    
                url = new URL("https://search.jd.com/Search?keyword=" + keyword);
            } catch (MalformedURLException e) {
    
    
                e.printStackTrace();
            }

            Document document = null;//jsoup解析URL
            try {
    
    
                document = Jsoup.parse(url, 30000);
            } catch (IOException e) {
    
    
                e.printStackTrace();
            }

            Element e1 = document.getElementById("J_goodsList");

            Elements e_lis = e1.getElementsByTag("li");

            for (Element e_li : e_lis) {
    
    

                //这边可能获取到多个价格,因为有些有套餐价格,我们可以获取第一个价格
                Elements e_price = e_li.getElementsByClass("p-price");
                String text = e_price.get(0).text();
                //这里获取的价格可能有多个,正常价和京东PLUS会员专享价,所以我们要进行切分
                String realPirce = "¥";
                int x = 1; //默认第一个就是¥的符号,也从1开始遍历,如果还有¥符号就break即可
                for (int i = 1; i < text.length(); i++) {
    
    

                    if (text.charAt(i) == '¥') {
    
    
                        break;
                    } else {
    
    
                        realPirce += text.charAt(i);
                    }

                }
                //商品图片
                Elements e_img = e_li.getElementsByClass("p-img");
                Elements img = e_img.get(0).getElementsByTag("img");
                //因为京东的商品图片不是封装到src里面的,而是封装到懒加载属性==data-lazy-img
                String src = img.get(0).attr("data-lazy-img");
                System.out.println("http:" + src);


                //价格
                System.out.println(realPirce);
                //商品标题
                Elements e_title = e_li.getElementsByClass("p-name");
                String title = e_title.get(0).getElementsByTag("em").text();
                System.out.println(title);

                IndexRequest indexRequest = new IndexRequest("jd_goods");

                //添加信息
                Map<String,Object> good=new HashMap<>();
                good.put("img","http:" + src);
                good.put("price",realPirce);
                good.put("title",title);
                IndexRequest source = indexRequest.source(JSON.toJSONString(good), XContentType.JSON);

                bulkRequest.add(source);


            }
            //批量操作,减少访问es服务器的次数
              restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);

        }catch (Exception e){
    
    
            System.out.println(e.getMessage());
        }



    }
}


4.使用工具类:

public static void main(String[] args) {
    
    
        SpringApplication.run(DemoApplication.class, args);

        jsoupUtils.searchData_JD("vivo"); 

    }

有了数据我们就可以用来展示到页面上了。。。。。

猜你喜欢

转载自blog.csdn.net/weixin_50071998/article/details/123896805