Log service data processing: Quick Start (SLB combat log processing)

background

Here we Gateway data (log Ali cloud SLB) in a Logstore For example, the data is processed and distributed to its unreasonable Logstore in.

data

Source logstore (slb-log) data content is a web log aliyun SLB, example of the format is as follows:

__source__:  log_service
__tag__:__receive_time__:  1559799897
__topic__:  
body_bytes_sent:  740
client_ip:  1.2.3.4
host:  m.abcd.com
http_host:  m.abcd.com
http_referer:  -
http_x_forwarded_for:  -
http_x_real_ip:  -
read_request_time:  0
request_length:  0
request_method:  GET
request_time:  0.000
request_uri:  /category/abc/product_id?id=75&type=2&sam=123
scheme:  https
server_protocol:  HTTP/2.0
slb_vport:  443
slbid:  lb-1234
ssl_cipher:  ECDHE-RSA-AES128-GCM-SHA256
ssl_protocol:  TLSv1.2
status:  200
tcpinfo_rtt:  58775
time:  2019-06-06T13:44:50+08:00
upstream_addr:  1.2.3.4:80
upstream_response_time:  4.1234
upstream_status:  200
vip_addr:  1.2.3.4
write_response_time: 4.1234

aims

Here we hope to get three data processing following data:

distribution

  1. All requests 2XX status or non-3XX, the copy of a target to logstore: slb-log-error, the log time of 180 days, in order to do further research and development and security class analysis, __topic__setslb_error
  2. The request log all less important pictures or static resources, distributed to the target LogStore: slb-log-media, log stored for 30 days, topic setslb_media_request
  3. Other request log, for further statistical analysis docking operations, distribution to target LogStore: slb-log-normal, logs are kept 90 days, topic setslb_normal

Change

  1. On slb_media_requestrequest
  • Extract the following fields
object_file=app.icon
object_type = icon   # css, jpeg, js 等
  • Reserved following fields:
http_referer:  -
body_bytes_sent:  740
client_ip:  1.2.3.4
host:  m.abcd.com
request_time:  4.33
  1. slb_normalrequest
  • Reserved following fields
body_bytes_sent:  740
client_ip:  1.2.3.4
host:  m.abcd.com
http_referer:  -
http_x_real_ip:  -
request_length:  0
request_method:  GET
request_time:  4.33
request_uri:  /category/abc/product_id?id=75&type=2&sam=123
scheme:  https
slb_vport:  443
slbid:  lb-1234
status:  200
time:  2019-06-06T13:44:50+08:00
  • Request_uri extracted parameters prefixedreqparam_
reqparam_id: 75
reqparam_type: 2
reqparam_sam: 123
  • http_x_real_ipIf it is empty or -when the fill client_ipvalue
  • Extraction hostof domain values:
domain:  abcd

Ready to work

Target logstore ready

  1. Created the following three logstore
slb-log-normal  # 日志保存90天,逻辑命名定为target0
slb-log-error    # 日志保存180天,逻辑命名定为target1
slb-log-media  # 日志保存30天, 逻辑命名定位target2
  1. And each configured index
    click on each logstore of [query] page of the [index] settings, and each log logstore storage fields, configure the appropriate index. You may also be used CloudShell the CLI of copy_logstore subcommand from a source index logstore copied to the destination, and then adjusted to simplify operation.

Permissions keys ready

Current operation authorization is required in order to read or write target logstore source logstore

  • Authorization keys AK
  • By role authorization (Phase II will support)

You need to prepare one or more (one for each logstore) AK secret key visit:

Minimum RAM authorized source of logstore

{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "log:ListShards",
        "log:GetCursorOrData",
        "log:GetConsumerGroupCheckPoint",
        "log:UpdateConsumerGroup",
        "log:ConsumerGroupHeartBeat",
        "log:ConsumerGroupUpdateCheckPoint",
        "log:ListConsumerGroup",
        "log:CreateConsumerGroup"
      ],
      "Resource": [
        "acs:log:*:*:project/源project/logstore/slb-log",
    "acs:log:*:*:project/源project/logstore/slb-log/*"
      ],
      "Effect": "Allow"
    }
  ]
}

Most of RAM authorization under target logstore

{
  "Statement": [
    {
      "Action": [
        "log:Post*"
      ],
      "Effect": "Allow",
      "Resource": [ "acs:log:*:*:project/目标Project/logstore/slb-log-error", "acs:log:*:*:project/目标Project/logstore/slb-log-media", "acs:log:*:*:project/目标Project/logstore/slb-log-normal"]
    }
  ],
  "Version": "1"
}

The two may also be considered in order to simplify the authorization combined into one operation.

Configuration processing tasks

Enter the interface processing rules

  1. In the log list service, select the source logstore (slb-log) [processing]

image

  1. Enter the interactive processing interface
    to select the date and time in the # 1, for example, [today], to ensure that data can be seen:

image

Copy failed request to log sbl-log-error

In the following edit box rule (syntax for the instructions refer to the subsequent section):

TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
DROP_EVENT_1 = ANY 

And then click [Preview] data, it will pop prompts for access to the source logstore of AK secret key, type the first preparation,
image

Later on, you can be seen in the data processing [tab] results, request all non-2XX / 3XX will be output to target1the target, and the topicsettings for slb_error:
image

Static extraction request log class and output to the sbl-log-media

Update the following rule in the rule edit box (About Syntax, refer to the subsequent sections:

#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")

SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)

TRANSFORM_EVENT_2 = NO_EMPTY("object_file"), 
            [ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
                 OUTPUT(name="target2", topic="slb_media_request"),
              ]
  
DROP_EVENT_1 = ANY 

[Click] preview data, later on, the results can be seen in the data processing [tab], the request is output to a static class target2object, as described in the foregoing needs, the specific field is extracted, and further new field by two object_file, and object_typeand topicarranged to slb_media_request:
image

Adaptation field and outputs a normal request to log sbl-log-normal

In the edit box as the update rule (syntax for the instructions refer to the subsequent sections:

#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")

#SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
#SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)

#TRANSFORM_EVENT_2 = NO_EMPTY("object_file"), 
#            [ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
#                 OUTPUT(name="target2", topic="slb_media_request"),
#             ]

SET_EVENT___topic__ = 'slb_normal'
EXTRACT_EVENT_request_uri = KV(prefix="reqparam_")
SET_EVENT_http_x_real_ip = op_if(op_eq(v("http_x_real_ip"), '-'), v("client_ip"), v("http_x_real_ip")

KEEP_FIELDS_1 = [F_META, r"body_bytes_sent|client_ip|host|http_referer|http_x_real_ip|request_length",
                         "request_method|request_time|request_uri|scheme|slb_vport|slbid|status"]

#DROP_EVENT_1 = ANY 

[Click] preview data, later on, the results can be seen in the data processing [tab], media-output request to the non- target0target, as described in the foregoing needs, the specific field is extracted, and the field http_x_real_ipis set became non- -values, and topicset in order slb_normal:
image

Save Configuration

最终加工规则
在确认规则正确后,去掉注释的部分,就是完整版本:

TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")

SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)

TRANSFORM_EVENT_2 = NO_EMPTY("object_file"), 
            [ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
                 OUTPUT(name="target2", topic="slb_media_request"),
             ]

SET_EVENT___topic__ = 'slb_normal'
EXTRACT_EVENT_request_uri = KV(prefix="reqparam_")
SET_EVENT_http_x_real_ip = op_if(op_eq(v("http_x_real_ip"), '-'), v("client_ip"), v("http_x_real_ip")

KEEP_FIELDS_1 = [F_META, r"body_bytes_sent|client_ip|host|http_referer|http_x_real_ip|request_length",
                         "request_method|request_time|request_uri|scheme|slb_vport|slbid|status"]

配置目标

点击【保存加工配置】,在配置中,依据前面的需求,配置源logstore的秘钥(预览的时候已经配置了),以及3个目标logstore的Project名、logstore名和写入的AK秘钥。注意3个目标的逻辑名称需要与规则中应用的规则保持一致。

image

配置加工范围
在加工范围中,可以根据情况选择【所有】、【某个时间开始】或者特定范围,这里选择【某个时间开始】,并选择此刻。那么保存后,新写入源logstore的数据将会自动应用规则写入到配置的3个目标去。
image

加工任务管理与监控

任务管理

点击日志服务项目页面的左侧导航栏中的【数据加工】,可以看到保存的加工任务,可以根据情况选择修改、停止、重启等操作。
image

状态洞察

对于数据加工的状态,可以点击以上的【规则洞察】,可以看到当前任务的执行状态,以及错误信息,以便调整:
image
image

语法详细说明

第一次预览操作

规则

TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
DROP_EVENT_1 = ANY 

说明

  1. 这里的规则TRANSFORM_ANY_1 = 条件, 操作
  • Python语法中,字符串前面用r修饰,可以避免写两个\的麻烦。
  • 操作:COUTPUT(name='target1', topic="slb_error") 表示复制一份数据并输出到目标target1中,并且将topic设置为slb_error
  1. 规则DROP_EVENT_1 = ANY表示丢弃所有剩余的日志;这里是为了预览的效果特意加上,实际保存的配置中不会有这句话。

第二次预览操作

规则

#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")

SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)

TRANSFORM_EVENT_2 = NO_EMPTY("object_file"), 
            [ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
                 OUTPUT(name="target2", topic="slb_media_request"),
             ]

DROP_EVENT_1 = ANY 

详细说明

  1. SET_EVENT_字段名表示设置一个新字段,使用后面的表达式函数的返回值:从字段request_uri中提取了字段object_file,然后进一步提取了object_type,如果object_file不存在,字段object_type也不会存在。
  2. TRANSFORM_EVENT_2 = 条件, [操作1, 操作2]For field indicates object_filenot empty event, to retain a particular field, and outputs it to target2the subsequent processing is no longer.
  3. Rules of the first line of step uses Python syntax ways #commented out, retains the last line DROP_EVENT_1 = ANYpurpose is to facilitate the preview.

The third preview operation

rule

#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")

#SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
#SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)

#TRANSFORM_EVENT_2 = NO_EMPTY("object_file"), 
#            [ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
#                 OUTPUT(name="target2", topic="slb_media_request"),
 #            ]

SET_EVENT___topic__ = 'slb_normal'
EXTRACT_EVENT_request_uri = KV(prefix="reqparam_")
SET_EVENT_http_x_real_ip = op_if(op_eq(v("http_x_real_ip"), '-'), v("client_ip"), v("http_x_real_ip")

KEEP_FIELDS_1 = [F_META, r"body_bytes_sent|client_ip|host|http_referer|http_x_real_ip|request_length",
                         "request_method|request_time|request_uri|scheme|slb_vport|slbid|status"]

Detailed description

  1. Here set the default event__topic__
  2. Use EXTRACT_EVENT_字段名 = 字段类操作, for the field request_urivalue is KVoperated, wherein the automatic extraction of the key and placed in the event.
  3. Based on further expression function sets the field http_x_real_ipvalue. Finally retain specific field.
  4. The default is not used OUTPUTand so on, because the default will be the first event to the target output configuration target0in.
  5. Rule two steps before using Python syntax ways #commented out, in order to avoid see a preview of the follow-up will be restored.

With further reference

Welcome scan code to join the official nail group (11,775,223) directly support real-time updates in a timely manner and Ali cloud engineers:
image

Guess you like

Origin yq.aliyun.com/articles/704936