ElasticSearch集群搭建,集成中文分词,建立全文检索索引(笔记)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/sinat_30276961/article/details/82872672

准备工作:

1.三个虚拟机节点,安装centos6x
2.根据客户端的jdk情况,准备elasticsearch版本
3.对应版本jdk
4.elasticSearch对应版本的中文分词插件
5.对应版本的head插件
6.不考虑kibana,所以直接考虑chrome的sense插件

1.虚拟机每个节点建立elastic用户和组

groupadd elastic
useradd -m -g elastic elastic

2.把安装包们上传到一个节点里

有多种方式:
a.通过scp命令(针对宿主机是macbook或者Linux系统电脑)
b.通过ftp上传

注:
1.系统是传统spingmvc,jdk是1.7,所以选择elasticsearch2.4版本
2.中文分词路径https://github.com/medcl/elasticsearch-analysis-ik/,里面注明了对应版本关系。安装方式选择release版直接放到/home/elastic/env/elasticsearch-2.4.0/plugins/ik下
3.jdk选择jdk-7u80-linux-x64.tar.gz
4.head插件路径https://github.com/mobz/elasticsearch-head,在5.0之前的版本,不需要用到nodejs环境,直接放到/home/elastic/env/elasticsearch-2.4.0/plugins/head下面就行;
5.0之后要安装nodejs

3.配置jdk环境

export PATH
export JAVA_HOME=/home/elastic/env/jdk1.7.0_80
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH

4.配置elasticsearch配置文件

172.16.30.100节点
vi elasticsearch.yml(关键配置)

cluster.name: blog_cluster
node.name: node-1
path.data: /home/elastic/data/elastic
path.logs: /home/elastic/data/elastic/log
network.host: 172.16.30.100  //绑定本地地址
http.port: 9200   //设置访问端口
discovery.zen.ping.unicast.hosts: ["172.16.30.100", "172.16.30.101","172.16.30.102"] //配置节点发现地址,自己的ip也配上

其余节点调整node_name、network.host这两个就可以了。

5.启动es

./bin/elasticsearch -d(后台启动)
调试阶段可以./bin/elasticsearch,日志输出在当前页面,方便排查问题
每个节点都启动起来

6.检查节点情况

网页访问http://172.16.30.100:9200/_plugin/head/
在这里插入图片描述

7.检查中文分词

http://172.16.30.100:9200/blog_index/_analyze?analyzer=ik_max_word&text=测试中文分词

{
  "tokens": [
    {
      "token": "测试",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "中文",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "分词",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "词",
      "start_offset": 5,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 3
    }
  ]
}

8.全文检索

8.1建立索引

PUT /blog_index
{
  "settings": {
     "refresh_interval": "5s",
     "number_of_shards" :   3, 
     "number_of_replicas" : 2 
  },
  "mappings": {
    "blog": {
      "dynamic": false, 
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "ik_max_word",
          "fields": {
            "cn": {
              "type": "string",
              "analyzer": "ik_max_word"
            },
            "en": {
              "type": "string",
              "analyzer": "english"
            }
          }
        },
        "content": {
          "type": "string",
          "analyzer": "ik_max_word"
        },
        "depcode": {
          "type": "string",
          "index": "not_analyzed"
        },
        "blog_type": {
          "type": "string",
          "index": "not_analyzed"
        },
        "create_time": {
          "type": "date"
        },
        "blog_id": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

8.2批量插入测试数据

POST /blog_index/blog/_bulk
{ "index": {}}
{ "content": "硬盘对集群非常重要,特别是建索引多的情况。磁盘是一个服务器最慢的系统,对于写比较重的集群,磁盘很容易成为集群的瓶颈。","title": "周星驰最新电影","blog_id": "a001","create_time": "2018-07-01","depcode": "3302","blog_type": "电影类"}
{ "index": {}}
{ "content": "如果可以承担的器SSD盘,最好使用SSD盘。如果使用SSD,最好调整I/O调度算法。RAID0是加快速度的不错方法。统计一下","title": "自动调整存储带宽","blog_id": "a002","create_time": "2018-07-02","depcode": "3302","blog_type": "其他类"}

8.3全文检索

POST /blog_index/blog/_search
{
  "query": {
    "multi_match": {
      "query": "系统",
      "fields": ["title","content"]
    }
  },
  "highlight": {
    "fields": {
      "title":{},
      "content":{}
    },
    "pre_tags": [
      "<em>"
    ],
    "post_tags": [
      "</em>"
    ]
  }
}

结果如下:

{
  "took": 119,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.09228377,
    "hits": [
      {
        "_index": "blog_index",
        "_type": "blog",
        "_id": "AWXDay9KXenDqWqMO57c",
        "_score": 0.09228377,
        "_source": {
          "content": "硬盘对集群非常重要,特别是建索引多的情况。磁盘是一个服务器最慢的系统,对于写比较重的集群,磁盘很容易成为集群的瓶颈。",
          "title": "周星驰最新电影",
          "blog_id": "a001",
          "create_time": "2018-07-01",
          "depcode": "3302",
          "blog_type": "电影类"
        },
        "highlight": {
          "content": [
            "硬盘对集群非常重要,特别是建索引多的情况。磁盘是一个服务器最慢的<em>系统</em>,对于写比较重的集群,磁盘很容易成为集群的瓶颈。"
          ]
        }
      }
    ]
  }
}

至此,搭建测试完成。

猜你喜欢

转载自blog.csdn.net/sinat_30276961/article/details/82872672
今日推荐