一、背景说明

之前讲了下ELK的安装,但是对各个组件的理解，使用都是一知半解。
这篇文章，就是ELK的详解，加上最后一个总体的小项目。

二、概念解析

Elasticsearch 是基于Lucene的搜索框架，Logstash 是一个开源的日志收集引擎，Kibana是一个开源的分析与可视化平台。
以上是网上对他们的简介，对新手来说晦涩难懂，意义不大，讲一下我对各个组件的理解吧。

首先，把Elasticsearch 这一个搜索框架，当作一个庞大的数据库软件，他是用来存储数据的；那么，索引Indice 表示的就是一个个database，type是他的table，Document 和Field就是它的Row 和Column。

Relational DB -> Database -> Table -> Row      -> Column
Elasticsearch -> Indice   -> Type  -> Document -> Field

当然，这样的描述并不准确，之所以这样做类比，是为了帮助我们很好的去理解插件。
有了这个概念之后，就能知道，Logstash是拿来往数据库插入数据的，即把搜集过来的日志信息变成Indice索引，往Elasticsearch里面插入；Kibana则是对数据库，即Elasticsearch里面存储的信息，进行一个可视化的展示和汇总统计。

三、Elasticsearch

1.基本概念

之前提到的索引（index），类型(type)等
分片shards：
数据量特大，没有足够大的硬盘空间来一次性存储，且一次性搜索那么多的数据，响应跟不上;es提供把数据进行分片存储，这样方便进行拓展和提高吞吐。
副本replicas：
分片的拷贝，当主分片不可用的时候，副本就充当主分片进行使用。

(默认地)Elasticsearch中的每个索引分配5个主分片和1个副本
如果你的集群中至少有两个节点，你的索引将会有5个主分片和另外5个复制分片（1个完全拷贝），这样每个索引总共就有10个分片。

2.查询语句入门

（1）search搜索语句入门之URL搜索

下面来讲一下URL中的_search搜索语句的基本使用，美化响应结果, 索引的基础操作等。 - **集群健康检查**

[root@hadoop02 ~]# curl http://localhost:9200/_cat/health?v
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1532340847 18:14:07  elasticsearch yellow          1         1     16  16    0    0       15             0                  -                 51.6%
[root@hadoop02 ~]# curl http://localhost:9200/_cluster/health 
{"cluster_name":"elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":16,"active_shards":16,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":15,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":51.61290322580645}
[root@hadoop02 ~]# curl  ttp://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 16,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 15,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 51.61290322580645
}
[root@hadoop02 ~]#

推荐使用第二种，在后面加上?v 可以展示成表格，而加上?pretty，是格式化美化的功能。
- 查询节点列表

[root@hadoop02 ~]# curl http://localhost:9200/_cat/nodes?v
ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.25.0.221            9          78   2    0.00    0.07     0.10 mdi       *      hadoop02_1
[root@hadoop02 ~]#

查看所有索引

  [root@hadoop02 ~]# curl http://localhost:9200/_cat/indices?v
health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   es-message-2018.07.19 9giNfDckTSCCxEiykvmp4A   5   1         33            0     61.2kb         61.2kb
[root@hadoop02 ~]#

（2）简单的curl 语句对索引的操作

对curl操作进行一些补充：
curl
- -X 指定http的请求方法有HEAD GET POST PUT DELETE
- -d 指定要传输的数据
- -H 指定http请求头信息

下面是具体的例子：
新增索引

curl -XPUT 'localhost:9200/blog_test?pretty'
curl -XPUT 'localhost:9200/blog?pretty'

删除索引

curl -XDELETE 'localhost:9200/blog_test?pretty'

新增一条记录，并指定为article类型，ID为1

curl -XPUT -H "Content-Type: application/json" 'localhost:9200/blog/article/1?pretty' -d '
{
  "title": "小D课堂啦啦啦",
  "content":"xdclass.net  小D课堂成立于2016年的，专注互联网在线教育，课程范围包括前端，后端，大数据，人工智能，微信开发等"
}'

ID查询记录

[root@hadoop02 ~]# curl -XGET 'localhost:9200/blog/article/1'
{"_index":"blog","_type":"article","_id":"1","_version":1,"found":true,"_source":
{
  "title": "小D课堂啦啦啦",
  "content":"xdclass.net  小D课堂成立于2016年的，专注互联网在线教育，课程范围包括前端，后端，大数据，人工智能，微信开发等"
}}
 //(美化推荐)
[root@hadoop02 ~]# curl -XGET 'localhost:9200/blog/article/1?pretty'  
{
  "_index" : "blog",
  "_type" : "article",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "title" : "小D课堂啦啦啦",
    "content" : "xdclass.net  小D课堂成立于2016年的，专注互联网在线教育，课程范围包括前端，后端，大数据，人工智能，微信开发等"
  }
}
[root@hadoop02 ~]#

内容搜索示例

[root@hadoop02 ~]# curl -XGET 'http://localhost:9200/blog/article/_search?q=title:小A'
{"took":196,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
//（美化示例）
[root@hadoop02 ~]# curl -XGET 'http://localhost:9200/blog/article/_search?pretty&q=title:小A'
{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}
[root@hadoop02 ~]#

（3）DSL语句进行查询搜索

结构化语句DSL去查询，bool，filter查询等

首先新增数据

curl -XPUT -H "Content-Type: application/json" 'localhost:9200/blog/article/7?pretty' -d '
{
  "title": "elk搭建日志采集系统",
  "content":"elk elasticsearch logstash kibana",
  "PV":18
}'

curl查询

[root@hadoop02 ~]# curl -XPOST -H "Content-Type: application/json" 'http://localhost:9200/blog/article/_search' -d '{
    "query" : {
        "term" : { "title" : "elk" }
    }
}'

结果：

{"took":141,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.6594265,"hits":[{"_index":"blog","_type":"article","_id":"7","_score":0.6594265,"_source":
{
  "title": "elk搭建日志采集系统",
  "content":"elk elasticsearch logstash kibana",
  "PV":18
}}]}}[root@hadoop02 ~]#

bool查询入门

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "elk" } }
      ],
      "must_not": [
        { "match": { "title": "小D" } }
      ]
    }
  }
}

filter查询入门

filtered语法已经在5.0版本后移除了，在2.0时候标记过期，改用filter 。参考地址

{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "PV": { "gt": 15 } }
      },
      "must": {
        "match": {
          "title": "ELK" }
      }
    }
  }
}

总结：官网参考
1、大部分filter的速度快于query的速度
2、filter不会计算相关度得分，且结果会有缓存，效率高
3、全文搜索、评分排序，使用query
4、是非过滤，精确匹配，使用filter

三、Logstash

1.基本概念

什么是[logstash](https://www.elastic.co/guide/en/logstash/current/index.html)? 开源的日志收集引擎，具备实时传输的能力读取不同的数据源，并进行过滤，开发者自定义规范输出到目的地 **流程讲解** - logstash通过管道pipeline进行传输，必选的两个组件是输入input和输出output，还有个可选过滤器filter - logstash将数据流中等每一条数据称之为一个event,即读取每一行数据的行为叫做事件

#输入
input {
  ...
}
# 过滤器
filter {
  ...
}
# 输出
output {
  ...
}

2.Logstash插件介绍

简单的配置 test.conf

input {  
  # 从文件读取日志信息
  file {  
     path => "/var/log/messages"
     type => "system"
     start_position => "beginning"
    }  
}  

filter {  

}  

output {  
    #标准输出
     elasticsearch {
        hosts => ["localhost:9200"]      
        index => "logstash-test-%{type}-%{host}"        
    }
}

input插件:file，http，kafka，rabbitmq等
filter插件:
grok(号称将非标准化的日志数据转换成标准化并且可搜索数据最好的方式，常用于处理Niginx，sysLog等日志)
drop(跳过某些日志，不进入output)
geoip(获取地理信息)

ELK详解（各个插件详解，未完待续）

一、背景说明

二、概念解析

三、Elasticsearch

1.基本概念

2.查询语句入门

（1）search搜索语句入门之URL搜索

（2）简单的curl 语句对索引的操作

（3）DSL语句进行查询搜索

三、Logstash

1.基本概念

2.Logstash插件介绍

猜你喜欢