大数据学习——elastic search 安装使用教程

elastic search 全文搜索引擎，将文档以文本方式存储后在elastic search中可以方便快捷的进行全文检索。

以下主要包括：

1. linux下安装elastic search

2. elastic search基本操作

3. 文档的全文检索应用

一、安装elastic search

linux联网下载： wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip

百度网盘备份：链接：https://pan.baidu.com/s/154lH8bt8dHZyO7UNaIVvbA 提取码：7prd

解压：unzip elasticsearch-5.5.1.zip

运行：.bin/elasticsearch （注意：第一个坑，权限不足启动失败）

如上图所示，elasticsearch出于安全原因不允许root用户启动，需要创建一个专用的用户并赋予适当的权限。为方便测试以下简化，在root组中创建一个elastic用户，将elasticsearch安装路径的777权限全给elastic用户

指令：id查看当前用户信息（我在root用户下输入id指令只是想看看root组的信息）

指令：useradd -m -g root elastic 新建用户elastic，放到root组

留意一下用户的根目录在哪

回到elasetic安装路径上，使用root用户将elasticsearch文件夹的所有权赋给elastic用户

赋权指令 chown -R （组名:）用户名文件夹注：同组的用户前面的组名和冒号就不需要了。

进入elastic专用用户，重新执行elasticsearch（注意：第二个坑，最大文件描述太低，最大内存可用太低）

如上图所示，关键信息：

解决方法可以参考博文：https://blog.csdn.net/qq_33363618/article/details/78882827 写了多个坑的处理方法。

其实只是一个警告，elastic实际上在看到INFO:..started字样之后已经启动成功了。换一个控制台尝试http连接到9200端口。

输入指令curl localhost:9200

如上图所示已经得到了elastic的名字版本等信息反馈。

强迫证的我表示虽然已经成功启动elastic search，但还是不喜欢warn日志的出现。

第一项：max file descriptors [4096] ...

root用户修改/etc/security/limits.conf 文件，添加内容如下：

* hard nofile 65536
* soft nofile 65536

保存后用户重新登陆或者直接关机重启生效。

第二项：max virtual memory areas ...

root用户修改/etc/sysctl.conf 文件，添加内容如下：

vm.max_map_count=655360

保存后输入指令sysctl -p 生效。

二、elastic search 的基本操作

在尝试操作之前，先了解一下基本概念：

elastic search是分布式的数据存储仓库。但设备可启动多实例称为node，多node组成cluster集群。

elastic search给所有字段创建索引，顶层结构也叫Index。可以理解为workspace。

Type虚拟类型用于过滤document（也可以没有），给一系列的数据划分为到一个类型中，相当于table。

document是真正存储数据的结构单元，多个document组成index，document的结构最好相近甚至以提升检索效率，相当于row。

不同于关系型数据库在workspace里创建多个table以区分不同的row数据格式，elastic search一般不在index中创建多个type来区分document的多种格式，而是尽量为每种document格式创建一个index。

设计一下测试学习思路;

新建index/查询index列表，插入document/查询document列表，按条件查询document数据

为什么不测试type？我的elastic search版本5.x中type还能用，从6.x开始index中只能有一个type，7.x弃用type。

2.1 新建 index

新建index的api： curl -X PUT 'localhost:9200/qftest0' 反馈json：{"acknowledged":true,"shards_acknowledged":true}

其中shards_acknowleged 为 true 表示执行成功

查询index列表的api: curl -X GET 'http://localhost:9200/_cat/indices?v'

上图可见我创建了三个index.

删除index的api: curl -X DELETE 'localhost:9200/qftest0'

2.2 document数据的基本操作

包括安装中文分词插件，创建index指定分词字段，插入document数据，select数据，update数据

2.2.1 为了支持中文，我们关闭elastic search（Ctrl+C直接关了就行了），安装中文分词插件。

指令：./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.1/elasticsearch-analysis-ik-5.5.1.zip

安装后重新启动elastic search。

控制台启动直接打印日志，就只能换个控制台测试了。

2.2.2 创建一个index，指定要分词的字段

curl -X PUT 'localhost:9200/qftest4' -d '
{
  "mappings": {
    "person": {
      "properties": {
        "user": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "title": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "desc": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        }
      }
    }
  }
}'

为了测试方便，我直接抄了别人的代码。创建了名为qfetst4的index，内含名为person的Type，有三个字段进行分词，分别为：user，title，desc。type指定字段类型，analyzer指定分词器，search_analyzer指定查询条件的分词器，ik_max_word是刚安装的中文分词插件提供的分词器。

2.2.3 插入document数据的api:

curl -X PUT 'localhost:9200/qftest4/person/1' -d '
{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}'

curl -X PUT 'localhost:9200/qftest4/person/2' -d '
{
  "user": "李四",
  "title": "架构师",
  "desc": "系统软件管理"
}'

插入成功的反馈json还是非常清晰实用的，如下图所示。

2.2.4 查询document数据

基本查询api: curl 'localhost:9200/qftest4/person/_search' 直接使用GET方式_serach方法

想要传参的指定查询条件的话增加 -d 指令后面跟json格式的查询条件。更多的操作可以参考官网

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

2.2.5 修改document数据

修改操作其实非常简单，就是上面的insert重新执行一遍，反馈的json中可以看到document的version会变递增，result会变成update，created会变成false。

2.2.6 删除document数据

先加一条，查询一下，再删掉

插入：

curl -X PUT 'localhost:9200/qftest4/person/3' -d '
{
  "user": "测试",
  "title": "测试",
  "desc": "测试"
}'

查询：curl 'localhost:9200/qftest4/person/_search'

删除：

curl -X DELETE 'localhost:9200/qftest4/person/3'

执行效果截图：

三、文档的全文检索应用

elastic search的查询是通过json指定的，有特定的格式，详细的可到官方网站上查询或者干脆买本工具书。

仅从全文检索角度看，我们假设desc为文档转换成文本的数据信息，我们需要对desc字段进行模糊查询，实现对index中存储的全部文档的全文分词模糊检索。

测试用例，从index名为qftest4的库中查找desc中包含“软件”和“管理”字样的document信息

 curl 'localhost:9200/qftest4/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "软件" }}
}'

 curl 'localhost:9200/qftest4/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "管理" }}
}'

测试截图：

上述博文教程参考了以下文章：

安装

http://www.ruanyifeng.com/blog/2017/08/elasticsearch.html

安装bug

https://blog.csdn.net/qq_33363618/article/details/78882827

操作

https://blog.csdn.net/wenxindiaolong061/article/details/82562450

最后，开发阶段建议开着elastic search的控制台多观察日志，如果实在不喜欢开多个控制台的话，可以使用 nohup XXX & 的方式后台运行，jps指令可以看到ES作为java进程的ES_ID和ES_NAME，最后关闭的时候 kill -9 ES_ID 。

大数据学习——elastic search 安装使用教程

猜你喜欢