ElasticSearch full-text search summary

Welcome to my personal blog: www.ifueen.com

ElasticSearch

Outline

ES Lucene is a package of tools to solve problems Lucene configuration does not support distributed shortcomings

Java ES also be used to develop and use Lucene as its core to achieve all index and search function, but its purpose is to hide the complexity of Lucene by a simple RESTful API, allowing simple full-text search

installation

Official Download: https: //www.elastic.co/downloads/elasticsearch

Then unzip, run under the bin file path elasticsearch.bat

1574244745968

1574244783437

This successful operation, and then enter localhost in the browser: 9200

1574244817786

This interface represents the emergence of a successful start

Aids Kibana5

When using MySql, you can use Navcation to visual management, Kibana5 is that you can visualize a management tool for ElasticSearch

Official Download: https: //www.elastic.co/downloads/kibana

Decompression then modify config / kibana.yml, the value set has been activated elasticsearch.url ES address value (generally do not need to modify the default is right)

启动 Kibana5: bin \ kibana.bat

Default access address: http: // localhost: 5601

Successful visit

CRUD operations

First be clear, ES is fully compliant with the Restful-style, on Restful style, just a lot of articles search, not described in detail here

ES behavior in stored data is called index (indexing), the document belongs to a type (type), and these types present in the index (index), we can simply comparing the correspondence between traditional databases and the ES:

Relational database (MYSQL) -> Database DB-> Table TABLE-> row ROW-> Column Column

ES of basic grammar

  • 增加:PUT PUT {index}/{type}/{id}
  • 修改:同PUT {index}/{type}/{id}
  • 删除:DELETE {index}/{type}/{id}
  • 查询:GET {index}/{type}/{id}

下面通过几个Demo来进行演示

IK分词器

ES默认对英文文本的分词器支持较好,但和lucene一样,如果需要对中文进行全文检索,那么需要使用中文分词器,同lucene一样,在使用中文全文检索前,需要集成IK分词器

GitHub下载:https://github.com/medcl/elasticsearch-analysis-ik

解压,并将其内容放置于ES根目录/plugins/ik

然后重启ES

测试分词

注意:IK分词器有两种类型,分别是ik_smart分词器和ik_max_word分词器。

ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。

ik_max_word:会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合

索引

ES索引的增删改查

# 创建索引库

PUT imp
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}
# 查询索引库

GET _cat/indices

# 查看指定索引库
GET _cat/indices/imp

# 修改索引库

# 删除索引库
DELETE imp

DSL查询

DSL过滤语句和DSL查询语句非常相似,但是它们的使用目的却不同:DSL过滤查询文档的方式更像是对于我的条件"有"或者"没有"(等于 ;不等于),而DSL查询语句则像是"有多像"(模糊查询)

DSL过滤和DSL查询在性能上的区别:

  • 过滤结果可以缓存并应用到后续请求。-> 精确过滤后的结果拿去模糊查询性能高
  • 查询语句同时匹配文档,计算相关性,所以更耗时,且不缓存。
  • 过滤语句可有效地配合查询语句完成文档过滤

案例:

#DSL查询
GET fueen/person/_search
{
  "query": {
    "match": {
      "speak": "高级动物"
    }
  }
}

# 创建测试数据
PUT fueen/user/5
{
  "id":5,
  "sex":"SAUMAG Note10 Pro",
  "ceagtor":"手机",
  "money":8000
}

GET fueen/user/_search?_source

# DSL过滤

GET fueen/user/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "ceagtor": "手机"
        }}
      ],
      "filter": {
        "range": {
          "money": {
            "gte": 6000,
            "lte": 8000
          }
        }
      }
    }
  },
  "from": 0, 
  "size": 10,
  "_source": ["sex","ceagtor","money"],
  "sort": [
    {
      "money": "desc"
    }
  ]
}

文档映射

ES的文档映射(mapping)机制用于进行字段类型确认,将每个字段匹配为一种确定的数据类型

就是规定了每个输入字段值的数据类型

案例

# 映射
PUT imp/life/_mapping
{
  "life":{
    "properties":{
      "id":{
        "type":"long"
      },
      "name":{
        "type":"text",
        "analyzer":"ik_smart",
        "search_analyzer":"ik_smart"
      }
    }
  }
}

# 查看文档映射
GET imp/_mapping/life

# 删除映射
DELETE imp

# 动态模板
PUT _template/kof_template  
{
  "template":   "*",  
  "settings": { "number_of_shards": 1 }, 
  "mappings": {
    "_default_": {
      "_all": { 
        "enabled": false 
      },
      "dynamic_templates": [
        {
          "string_as_text": { 
            "match_mapping_type": "string",
            "match":   "*_text", 
            "mapping": {
              "type": "text",
              "analyzer": "ik_max_word",
              "search_analyzer": "ik_max_word",
              "fields": {
                "raw": {
                  "type":  "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        {
          "string_as_keyword": { 
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
             }
          }
        }
      ]
    }
  }}

集群

为什么需要ES集群

和Redis一样,集群能够做很多事情

  • 单节点故障
  • 支持高并发
  • 海量数据存储

ES的集群节点

主节点

node.master = true, on behalf of the successful node-based qualifications, the main responsibility is the master node and cluster operations-related content, such as creating or deleting indexes, tracking which nodes are part of the cluster, and decide which slice allocated to the relevant node . Usually separate data node and the master node will, node.master = true, node.data = false

Data Node

node.data = true, the main node is node data store index data, mainly document CRUD operations, the polymerization operation and the like, the data node CPU, IO, high memory requirements, optimization needs to be done when a node status monitoring, node expansion to do when insufficient resources. Configuration: mode.master = false, mode.data = true

Load balancing node

When the master node and the data node configuration is set to false when, the node can handle routing request, the search processing, distribution index operation , etc. In essence, the client node performance intelligent load balancer. Configuration: mode.master = false, mode.data = false

Analog build clusters

Prepare three ES Service

Can simulate three different files, modify the different ports

Configuration instructions

  • Node1- Configuration
# 统一的集群名
cluster.name: my-ealsticsearch
# 当前节点名
node.name: node-1
# 对外暴露端口使外网访问
network.host: 127.0.0.1
# 对外暴露端口
http.port: 9201
#集群间通讯端口号
transport.tcp.port: 9301
#集群的ip集合,可指定端口,默认为9300
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
  • Node2- Configuration
# 统一的集群名
cluster.name: my-ealsticsearch
# 当前节点名
node.name: node-2
# 对外暴露端口使外网访问
network.host: 127.0.0.1
# 对外暴露端口
http.port: 9202
#集群间通讯端口号
transport.tcp.port: 9302
#集群的ip集合,可指定端口,默认为9300
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]
  • Node3- Configuration
# 统一的集群名
cluster.name: my-ealsticsearch
# 当前节点名
node.name: node-3
# 对外暴露端口使外网访问
network.host: 127.0.0.1
# 对外暴露端口
http.port: 9203
#集群间通讯端口号
transport.tcp.port: 9303
#集群的ip集合,可指定端口,默认为9300
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9301","127.0.0.1:9302","127.0.0.1:9303"]

ES were started three nodes, visit: http: //127.0.0.1: 9201 /

Then go accessed through Kiban5, modify the default configuration path: elasticsearch.url: "http: // localhost: 9201"

Successful visit

Java ES operation

Creating a Maven project, added in pom.xml

<dependencies>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>5.2.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.7</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>

Write tools of a connection ES

package com.ifueen.es;

import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

import java.net.InetAddress;
import java.net.UnknownHostException;

/**
 * ES集群的工具类
 */
public class ESClientUtil {

    public static TransportClient getClient(){
        Settings settings = Settings.builder()
        .put("cluster.name","my-ealsticsearch")
        .put("client.transport.sniff", true).build();
        
        TransportClient client = null;
        try {
            client = new PreBuiltTransportClient(settings)
                    .addTransportAddress(
                    		new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9302));
        } catch (UnknownHostException e) {
            e.printStackTrace();
        }
        return client;
    }
}

Then start testing

package com.ifueen.es;

import org.elasticsearch.action.bulk.BulkItemResponse;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequestBuilder;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequestBuilder;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.update.UpdateRequestBuilder;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHits;
import org.junit.Test;

import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

/**
 * 测试集群
 */
public class TestESCluster {
    //获取客户端
    TransportClient client = ESClientUtil.getClient();
    /**
     * 添加文档
     */
    @Test
    public void testadd(){

        //创建索引
        IndexRequestBuilder index = client.prepareIndex("fueen", "person", "1");
        Map<String,Object> data = new HashMap<String, Object>();
        data.put("id","1");
        data.put("username","fueen");
        data.put("speak","我们不能失去信仰");
        //添加
        IndexResponse indexResponse = index.setSource(data).get();
        System.out.println(indexResponse);
        client.close();

    }

    /**
     * 获取文档
     */
    @Test
    public void testquery(){
        GetResponse getFields = client.prepareGet("fueen", "person", "1").get();
        System.out.println(getFields.getSource());
    }

    /**
     * 更新
     */
    @Test
    public void testupdate(){
        HashMap<String, Object> map = new HashMap<>();
        map.put("id","1");
        map.put("username","这个世界会好吗");
        map.put("speak","李志");
        UpdateRequestBuilder builder = client.prepareUpdate("fueen", "person", "1");
        UpdateResponse updateResponse = builder.setDoc(map).get();
        System.out.println(updateResponse);
        client.close();
    }

    /**
     * 删除文档
     */
    @Test
    public void testdel(){
        DeleteRequestBuilder del = client.prepareDelete("fueen", "person", "1");
        DeleteResponse deleteResponse = del.get();
        System.out.println(deleteResponse);
        client.close();
    }

    /**
     * 批量添加
     */
    @Test
    public void testbulikadd(){
        BulkRequestBuilder builder = client.prepareBulk();
        Map<String, Object> map = new HashMap<>();
        map.put("id","1");
        map.put("username","会沉寂吗");
        map.put("speak","我的金桔");
        builder.add(client.prepareIndex("fueen","person","1")
                .setSource(map));


        Map<String, Object> map1 = new HashMap<>();
        map1.put("id","2");
        map1.put("username","会沉寂吗");
        map1.put("speak","人民不需要自由");
        builder.add(client.prepareIndex("fueen","person","2")
                .setSource(map1));

        BulkResponse bulkItemResponses = builder.get();
        Iterator<BulkItemResponse> iterator = bulkItemResponses.iterator();
        while (iterator.hasNext()){
            BulkItemResponse next = iterator.next();
            System.out.println(next.getResponse());
        }
        client.close();
    }

    /**
     * 批量查询
     */
    @Test
    public void testbulikquery(){
        SearchRequestBuilder fueen = client.prepareSearch("fueen");
        fueen.setTypes("person");
        fueen.setFrom(0);
        fueen.setSize(10);

        //查询条件
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        List<QueryBuilder> must = boolQueryBuilder.must();
        must.add(QueryBuilders.matchQuery("username","会沉寂吗"));

        SearchResponse searchResponse = fueen.setQuery(boolQueryBuilder).get();
        SearchHits hits = searchResponse.getHits();
        System.out.println("条数:"+hits.getTotalHits());
        hits.forEach(h->{
            System.out.println(h);
        });
        client.close();
    }

}


Published 87 original articles · won praise 7 · views 20000 +

Guess you like

Origin blog.csdn.net/f2315895270/article/details/103205669