background
Here we Gateway data (log Ali cloud SLB) in a Logstore For example, the data is processed and distributed to its unreasonable Logstore in.
data
Source logstore (slb-log) data content is a web log aliyun SLB, example of the format is as follows:
__source__: log_service
__tag__:__receive_time__: 1559799897
__topic__:
body_bytes_sent: 740
client_ip: 1.2.3.4
host: m.abcd.com
http_host: m.abcd.com
http_referer: -
http_x_forwarded_for: -
http_x_real_ip: -
read_request_time: 0
request_length: 0
request_method: GET
request_time: 0.000
request_uri: /category/abc/product_id?id=75&type=2&sam=123
scheme: https
server_protocol: HTTP/2.0
slb_vport: 443
slbid: lb-1234
ssl_cipher: ECDHE-RSA-AES128-GCM-SHA256
ssl_protocol: TLSv1.2
status: 200
tcpinfo_rtt: 58775
time: 2019-06-06T13:44:50+08:00
upstream_addr: 1.2.3.4:80
upstream_response_time: 4.1234
upstream_status: 200
vip_addr: 1.2.3.4
write_response_time: 4.1234
aims
Here we hope to get three data processing following data:
distribution
- All requests 2XX status or non-3XX, the copy of a target to logstore:
slb-log-error
, the log time of 180 days, in order to do further research and development and security class analysis,__topic__
setslb_error
- The request log all less important pictures or static resources, distributed to the target LogStore:
slb-log-media
, log stored for 30 days, topic setslb_media_request
- Other request log, for further statistical analysis docking operations, distribution to target LogStore:
slb-log-normal
, logs are kept 90 days, topic setslb_normal
Change
- On
slb_media_request
request
- Extract the following fields
object_file=app.icon
object_type = icon # css, jpeg, js 等
- Reserved following fields:
http_referer: -
body_bytes_sent: 740
client_ip: 1.2.3.4
host: m.abcd.com
request_time: 4.33
-
slb_normal
request
- Reserved following fields
body_bytes_sent: 740
client_ip: 1.2.3.4
host: m.abcd.com
http_referer: -
http_x_real_ip: -
request_length: 0
request_method: GET
request_time: 4.33
request_uri: /category/abc/product_id?id=75&type=2&sam=123
scheme: https
slb_vport: 443
slbid: lb-1234
status: 200
time: 2019-06-06T13:44:50+08:00
- Request_uri extracted parameters prefixed
reqparam_
reqparam_id: 75
reqparam_type: 2
reqparam_sam: 123
-
http_x_real_ip
If it is empty or-
when the fillclient_ip
value - Extraction
host
of domain values:
domain: abcd
Ready to work
Target logstore ready
- Created the following three logstore
slb-log-normal # 日志保存90天,逻辑命名定为target0
slb-log-error # 日志保存180天,逻辑命名定为target1
slb-log-media # 日志保存30天, 逻辑命名定位target2
- And each configured index
click on each logstore of [query] page of the [index] settings, and each log logstore storage fields, configure the appropriate index. You may also be used CloudShell the CLI of copy_logstore subcommand from a source index logstore copied to the destination, and then adjusted to simplify operation.
Permissions keys ready
Current operation authorization is required in order to read or write target logstore source logstore
- Authorization keys AK
- By role authorization (Phase II will support)
You need to prepare one or more (one for each logstore) AK secret key visit:
Minimum RAM authorized source of logstore
{
"Version": "1",
"Statement": [
{
"Action": [
"log:ListShards",
"log:GetCursorOrData",
"log:GetConsumerGroupCheckPoint",
"log:UpdateConsumerGroup",
"log:ConsumerGroupHeartBeat",
"log:ConsumerGroupUpdateCheckPoint",
"log:ListConsumerGroup",
"log:CreateConsumerGroup"
],
"Resource": [
"acs:log:*:*:project/源project/logstore/slb-log",
"acs:log:*:*:project/源project/logstore/slb-log/*"
],
"Effect": "Allow"
}
]
}
Most of RAM authorization under target logstore
{
"Statement": [
{
"Action": [
"log:Post*"
],
"Effect": "Allow",
"Resource": [ "acs:log:*:*:project/目标Project/logstore/slb-log-error", "acs:log:*:*:project/目标Project/logstore/slb-log-media", "acs:log:*:*:project/目标Project/logstore/slb-log-normal"]
}
],
"Version": "1"
}
The two may also be considered in order to simplify the authorization combined into one operation.
Configuration processing tasks
Enter the interface processing rules
- In the log list service, select the source logstore (slb-log) [processing]
- Enter the interactive processing interface
to select the date and time in the # 1, for example, [today], to ensure that data can be seen:
Copy failed request to log sbl-log-error
In the following edit box rule (syntax for the instructions refer to the subsequent section):
TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
DROP_EVENT_1 = ANY
And then click [Preview] data, it will pop prompts for access to the source logstore of AK secret key, type the first preparation,
Later on, you can be seen in the data processing [tab] results, request all non-2XX / 3XX will be output to target1
the target, and the topic
settings for slb_error
:
Static extraction request log class and output to the sbl-log-media
Update the following rule in the rule edit box (About Syntax, refer to the subsequent sections:
#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)
TRANSFORM_EVENT_2 = NO_EMPTY("object_file"),
[ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
OUTPUT(name="target2", topic="slb_media_request"),
]
DROP_EVENT_1 = ANY
[Click] preview data, later on, the results can be seen in the data processing [tab], the request is output to a static class target2
object, as described in the foregoing needs, the specific field is extracted, and further new field by two object_file
, and object_type
and topic
arranged to slb_media_request
:
Adaptation field and outputs a normal request to log sbl-log-normal
In the edit box as the update rule (syntax for the instructions refer to the subsequent sections:
#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
#SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
#SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)
#TRANSFORM_EVENT_2 = NO_EMPTY("object_file"),
# [ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
# OUTPUT(name="target2", topic="slb_media_request"),
# ]
SET_EVENT___topic__ = 'slb_normal'
EXTRACT_EVENT_request_uri = KV(prefix="reqparam_")
SET_EVENT_http_x_real_ip = op_if(op_eq(v("http_x_real_ip"), '-'), v("client_ip"), v("http_x_real_ip")
KEEP_FIELDS_1 = [F_META, r"body_bytes_sent|client_ip|host|http_referer|http_x_real_ip|request_length",
"request_method|request_time|request_uri|scheme|slb_vport|slbid|status"]
#DROP_EVENT_1 = ANY
[Click] preview data, later on, the results can be seen in the data processing [tab], media-output request to the non- target0
target, as described in the foregoing needs, the specific field is extracted, and the field http_x_real_ip
is set became non- -
values, and topic
set in order slb_normal
:
Save Configuration
最终加工规则
在确认规则正确后,去掉注释的部分,就是完整版本:
TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)
TRANSFORM_EVENT_2 = NO_EMPTY("object_file"),
[ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
OUTPUT(name="target2", topic="slb_media_request"),
]
SET_EVENT___topic__ = 'slb_normal'
EXTRACT_EVENT_request_uri = KV(prefix="reqparam_")
SET_EVENT_http_x_real_ip = op_if(op_eq(v("http_x_real_ip"), '-'), v("client_ip"), v("http_x_real_ip")
KEEP_FIELDS_1 = [F_META, r"body_bytes_sent|client_ip|host|http_referer|http_x_real_ip|request_length",
"request_method|request_time|request_uri|scheme|slb_vport|slbid|status"]
配置目标
点击【保存加工配置】,在配置中,依据前面的需求,配置源logstore的秘钥(预览的时候已经配置了),以及3个目标logstore的Project名、logstore名和写入的AK秘钥。注意3个目标的逻辑名称需要与规则中应用的规则保持一致。
配置加工范围
在加工范围中,可以根据情况选择【所有】、【某个时间开始】或者特定范围,这里选择【某个时间开始】,并选择此刻。那么保存后,新写入源logstore的数据将会自动应用规则写入到配置的3个目标去。
加工任务管理与监控
任务管理
点击日志服务项目页面的左侧导航栏中的【数据加工】,可以看到保存的加工任务,可以根据情况选择修改、停止、重启等操作。
状态洞察
对于数据加工的状态,可以点击以上的【规则洞察】,可以看到当前任务的执行状态,以及错误信息,以便调整:
语法详细说明
第一次预览操作
规则
TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
DROP_EVENT_1 = ANY
说明
- 这里的规则
TRANSFORM_ANY_1 = 条件, 操作
- Python语法中,字符串前面用
r
修饰,可以避免写两个\
的麻烦。 - 操作:COUTPUT(name='target1', topic="slb_error") 表示复制一份数据并输出到目标
target1
中,并且将topic
设置为slb_error
- 规则
DROP_EVENT_1 = ANY
表示丢弃所有剩余的日志;这里是为了预览的效果特意加上,实际保存的配置中不会有这句话。
第二次预览操作
规则
#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)
TRANSFORM_EVENT_2 = NO_EMPTY("object_file"),
[ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
OUTPUT(name="target2", topic="slb_media_request"),
]
DROP_EVENT_1 = ANY
详细说明
-
SET_EVENT_字段名
表示设置一个新字段,使用后面的表达式函数
的返回值:从字段request_uri
中提取了字段object_file
,然后进一步提取了object_type
,如果object_file
不存在,字段object_type
也不会存在。 -
TRANSFORM_EVENT_2 = 条件, [操作1, 操作2]
For field indicatesobject_file
not empty event, to retain a particular field, and outputs it totarget2
the subsequent processing is no longer. - Rules of the first line of step uses Python syntax ways
#
commented out, retains the last lineDROP_EVENT_1 = ANY
purpose is to facilitate the preview.
The third preview operation
rule
#TRANSFORM_ANY_1 = {"status": r"4\d+|5\d+" }, COUTPUT(name='target1', topic="slb_error")
#SET_EVENT_object_file = regex(v("request_uri"), r"/([\.\w]+\.\w+)(?:\?.+|$)?", gi=0)
#SET_EVENT_object_type = regex(v("object_file"), r"^[\w\.]+\.(\w+)$", gi=0)
#TRANSFORM_EVENT_2 = NO_EMPTY("object_file"),
# [ KEEP_F([F_META, r"object_\w+|request_uri|client_ip|host|request_time" ]),
# OUTPUT(name="target2", topic="slb_media_request"),
# ]
SET_EVENT___topic__ = 'slb_normal'
EXTRACT_EVENT_request_uri = KV(prefix="reqparam_")
SET_EVENT_http_x_real_ip = op_if(op_eq(v("http_x_real_ip"), '-'), v("client_ip"), v("http_x_real_ip")
KEEP_FIELDS_1 = [F_META, r"body_bytes_sent|client_ip|host|http_referer|http_x_real_ip|request_length",
"request_method|request_time|request_uri|scheme|slb_vport|slbid|status"]
Detailed description
- Here set the default event
__topic__
- Use
EXTRACT_EVENT_字段名 = 字段类操作
, for the fieldrequest_uri
value isKV
operated, wherein the automatic extraction of the key and placed in the event. - Based on further expression function sets the field
http_x_real_ip
value. Finally retain specific field. - The default is not used
OUTPUT
and so on, because the default will be the first event to the target output configurationtarget0
in. - Rule two steps before using Python syntax ways
#
commented out, in order to avoid see a preview of the follow-up will be restored.
With further reference
Welcome scan code to join the official nail group (11,775,223) directly support real-time updates in a timely manner and Ali cloud engineers: