Real-time collection and analysis Nginx logs, automated risk banned IP program
Article Address: blog.piaoruiqing.com/2019/11/17/...
Foreword
This Share automated collection, analysis Nginx logs and real-time solutions and banned the practice of IP risk.
Read this article you can harvest to:
- Log collection program.
- Simple IP program risk assessment.
- IP banned strategies and programs.
You need to read this:
- Familiar with programming.
- Familiar with common Linux commands.
- Learn Docker.
background
After analyzing the nginx access log, you see a large number of invalid request 404, URL are random some sensitive words. And recently more and more frequently these requests, manual batch banned some of the IP, will soon have a new IP coming.
Therefore initiation of the idea banned IP through automated analysis Nginx logs in real time.
demand
No. | demand | Remark |
---|---|---|
1 | Nginx log collection | There are many programs, I chose the most suitable individual program server: filebeat +redis |
2 | Real-time log analysis | Real-time consumption redis of the log, parsing the data needs to be analyzed |
3 | IP Risk Assessment | IP on risk assessment, multiple dimensions: the number of visits, IP ownership, usage, etc. |
4 | Real-time ban | It was not the same risk for long banned IP |
analysis
From the log brief summary of several features:
No. | feature | description | Remark |
---|---|---|---|
1 | Frequent access | Several times even dozens of times per second | Normal traffic behavior exists bursty traffic, but will not last long |
2 | Continued request | Long duration | Ditto |
3 | Most 404 | Most requested URL could not exist, and there are sensitive words such as admin, login, phpMyAdmin, backup, etc. | Normal traffic behavior rarely present case |
4 | IP is not normal | By ASN can see some clues, such requests is generally not an ordinary personal IP users. | Queries its use is generally COM (commercial), DCH (data centers / web hosting / transmission), SES (search engine spiders), etc. |
Note: This analysis is by IP ip2location free version of the database, there will be described in detail later.
Program
Log collection
Source: author's website through docker deployment, Nginx as the only entrance to record all access logs.
Acquisition: Due to limited resources, I chose a lightweight log collection tool Filebeat , Nginx logs to collect and write Redis.
Risk assessment
Monitor services based on URL, IP, history and other risk assessment scores, calculate the final risk factor.
IP ban
Monitor the discovery IP dangerous (risk coefficient exceeds a threshold), calls Actuator for IP ban, banned duration calculated according to the risk factor.
Implement
Log collection
Filebeat usage is very simple, the author deployed by swarm, their deployment files as follows (in order to prevent the code is too long, here omitted other services):
version: '3.5'
services:
filebeat:
image: docker.elastic.co/beats/filebeat:7.4.2
deploy:
resources:
limits:
memory: 64M
restart_policy:
condition: on-failure
volumes:
- $PWD/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
- $PWD/filebeat/data:/usr/share/filebeat/data:rw
- $PWD/nginx/logs:/logs/nginx:ro
environment:
TZ: Asia/Shanghai
depends_on:
- nginx
复制代码
- Image : Specifies the mirror and version.
- deploy.resources.limits.memory : limit memory.
- ** PWD
为当前目录, 即执行
Docker Stack Deploy的目录.
ro` read-only privileges. - PWD $ / filebeat / data: / usr / report this content share / filebeat / data: rw : the need to persist data directory, which docker redeployment delete a reading log will record the location.
rw
Read-write permissions. - PWD $ / nginx / logs: / logs / nginx: RO : Nginx log directory that maps to Filebeat.
- environment.TZ : Time zone
filebeat.yml
Document reads as follows:
filebeat.inputs:
- type: log
enabled: true
paths:
- /logs/nginx/access.log
json.keys_under_root: true
json.overwrite_keys: true
output.redis:
hosts: ["redis-server"]
password: "{your redis password}"
key: "filebeat:nginx:accesslog"
db: 0
timeout: 5
复制代码
-
filebeat.inputs : definition of input
-
Paths : Log Path
-
json.keys_under_root : json content on the log of the root node (if not set, the entire data is put under a two node) Note: author nginx log format json reference configuration is configured as follows:
log_format main_json escape=json '{' '"@timestamp":"$time_iso8601",' '"http_host":"$http_host",' '"remote_addr":"$remote_addr",' '"request_uri":"$request_uri",' '"request_method":"$request_method",' '"server_protocol":"$server_protocol",' '"status":$status,' '"request_time":"$request_time",' '"body_bytes_sent":$body_bytes_sent,' '"http_referer":"$http_referer",' '"http_user_agent":"$http_user_agent",' '"http_x_forwarded_for":"$http_x_forwarded_for"' '}'; 复制代码
-
json.overwrite_keys : covering KEY Filebeat generated here in order to cover
@timestamp
the field. -
output.redis : Define the output.
After the success of the deployment redis View data:
This article published in Pu Ruiqing's blog , allows non-commercial use reproduced, reprinted but must retain the original author Pu Ruiqing and links: blog.piaoruiqing.com . If the authorization aspects of consultation or cooperation, please contact E-mail: piaoruiqing @ Gmail. COM .
Risk assessment
Monitor
Services written in Java, using the docker deployment, and Actuator
service http interact.
Risk assessment requires a comprehensive multiple dimensions:
No. | Dimensions | Tactics |
---|---|---|
1 | IP attribution | Chinese website user groups are generally home in China, if the IP is attributable to the foreign countries need to be vigilant. |
2 | use | IP acquired by its use, DCH (data centers / web hosting / transmission), SES (search engine spiders) and so increase the risk score. |
3 | Access to resources | Access to resources do not exist and the path contains sensitive words, such as admin, login, phpMyAdmin, backup, improve risk score. |
4 | Frequency and duration of visit | Frequent and persistent request, consider raising the score. |
5 | History Ratings | History Ratings integrated into the current score. |
Obtain an IP home
IP attribution to get easier, there are a lot of sites offer a free data service packages, such as IpInfo etc. There is also free version IP database can be downloaded as ip2location and so on.
I use the ip2location free version of the database:
ip_from and ip_to are the beginning and ending IP, is stored in a decimal format, MySQL by the inet_aton('your ip')
function such as to decimal IP:
set @a:= inet_aton('172.217.6.78');
SELECT * FROM ip2location_db11 WHERE ip_from <= @a AND ip_to >= @a LIMIT 1;
复制代码
ip_from | ip_to | country_code | country_name | region_name | city_name | latitude | longitude | zip_code | time_zone |
---|---|---|---|---|---|---|---|---|---|
2899902464 | 2899910655 | US | United States | California | Mountain View | 37.405992 | -122.07852 | 94043 | -07:00 |
- Large amounts of data, it is recommended to bring
LIMIT 1
.
Get AS, ASN and use
Most sites offer free services are not query ASN or not its use. ASN data also free database, but still does not have its uses and types. At this point I Quxianjiuguo by other methods.
ip2location offers a free version IP2Location™LITE IP-ASN
and IP2Proxy™LITE
a database.
ASN-LITE ™ IP IP2Location : database provides methods for determining and autonomous system number (ASN) reference.
LITE ™ IP2Proxy : a database that contains the IP address is used as an open proxy in the proxy database includes all types of public IPv4 and IPv6 address, country, region, city, ISP, domain, type of use, ASN and the latest record.
IP2Location ™ LITE IP-ASN can not use the type of query to the IP, IP2Proxy ™ LITE less data does not necessarily contain the specified IP but can combine these two libraries, IP uses approximately guess:
-
First, IP2Proxy ™ LITE check out the IP of the ASN.
set @a:= inet_aton('172.217.6.78'); SELECT * FROM ip2location_asn WHERE ip_from <= @a AND ip_to >= @a LIMIT 1; 复制代码
ip_from ip_to cidr asn as 2899904000 2899904255 172.217.6.0/24 15169 Google LLC -
ASN and IP binding, the same query ASN closest to the specified IP before and after the two records:
set @a:= inet_aton('172.217.6.78'); SELECT * FROM ip2proxy_px8 WHERE ip_from >= @a AND asn = 15169 ORDER BY ip_from ASC LIMIT 1; SELECT * FROM ip2proxy_px8 WHERE ip_from <= @a AND asn = 15169 ORDER BY ip_from DESC LIMIT 1; 复制代码
ip_from ip_to proxy_type country_code country_name region_name city_name isp domain usage_type asn as last_seen 2899904131 2899904131 PUB US United States California Mountain View Google LLC google.com DCH 15169 Google LLC 30 ip_from ip_to proxy_type country_code country_name region_name city_name isp domain usage_type asn as last_seen 2899904015 2899904015 PUB US United States California Mountain View Google LLC google.com DCH 15169 Google LLC 30 -
Absolute differences in current IP and IP to query the proxy record is calculated.
IP proxy IP abs(IP - proxy IP) 2899904078 2899904131 53 2899904078 2899904015 63 If the absolute value is very close, it is considered that IP Proxy IP and uses the same as the definition can be very close adjusted according to the situation, as the absolute value of 65535.
Overall rating
Overall rating rules can be adjusted according to the actual scene
No. | Ratings items | Scoring rules (1-10) |
---|---|---|
1 | IP attribution | Such as: domestic 5 points, 10 points abroad, can be subdivided by region |
2 | use | Such as: ISP / MOB 2 points, COM from 5 minutes, DCH meter 10 minutes |
3 | Access to resources | Such as: 404 minutes from 5, there is always 10. sensitive words |
4 | Frequency and duration of visit | Calculating a score based on the average number of access to a period of time |
5 | History Ratings |
The above 1-5 items, a calculation can be simply added to or weighted.
IP ban
笔者采用**iptables+ipset**的方式进行IP封禁. Actuator
服务使用node编写, 运行在主机上, docker中的Monitor
通过http与其交互. 封禁IP部分代码如下:
'use strict';
const express = require('express');
const shell = require('shelljs');
const router = express.Router();
router.post('/blacklist/:name/:ip', function (req, res, next) {
let name = req.params.name;
let ip = req.params.ip;
let timeout = req.query.timeout;
let cmd = `ipset -exist add ${name} ${ip} timeout ${timeout}`;
console.log(cmd);
shell.exec(cmd);
res.send('ok\n');
});
module.exports = router;
复制代码
- name: 黑名单名称.
- timeout: 超时时间, 单位: 秒.
目前, 还是有不少"头铁"的IP频繁扫描笔者的网站, 在发现后几秒内自动屏蔽掉, 目前效果比较理想.
结语
- 爬虫、机器人、漏洞扫描等给网站造成了不必要的开销甚至带来风险, 不可忽视. 绝对安全很难做到, 但至少可以做到比别人安全.
- 封禁是相对暴力的手段, 一定要把握好尺度, 出现误杀会导致网站流失用户.
- 除了封禁外, 也可考虑"祸水东引", 将风险IP 302重定向到gitpage(备份站点), 这样就算误杀, 也不会造成用户无法访问的情况.
- 道路千万条, 安全第一条
如果这篇文章对您有帮助,请点个赞吧 ( ̄▽ ̄)"
推荐阅读
- 不停服! 怎么迁移数据
- 如何学习编程
- 开放API网关实践(一) ——设计一个API网关
- 开放API网关实践(二) —— 重放攻击及防御
- 开放API网关实践(三) —— 限流
- Kubernetes(一) 跟着官方文档从零搭建K8S
- Kubernetes(二) 应用部署
- Kubernetes(三) 如何从外部访问服务
欢迎关注公众号(代码如诗)
This article published in Pu Ruiqing's blog , allows non-commercial use reproduced, reprinted but must retain the original author Pu Ruiqing and links: blog.piaoruiqing.com . If the authorization aspects of consultation or cooperation, please contact E-mail: piaoruiqing @ Gmail. COM .