PostgreSQL 拒绝服务DDOS攻击与防范

标签

PostgreSQL , ddos , 拒绝服务 , 锁 , SLOT


背景

连接数据库的过程中,需要数据库有足够的SLOT(连接槽,通过max_connections配置),认证。如果把连接槽位占用,或者在认证过程加锁(使得认证过程被锁),则可以制造DDOS攻击。

占用连接槽的攻击与防范方法:

《PostgreSQL 连接攻击(类似DDoS)》

认证过程加锁的攻击方法,本文会提到。

https://paquier.xyz/postgresql-2/postgres-12-dos-prevention/

https://www.postgresql.org/message-id/[email protected]

BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack  
From:	PG Bug reporting form <noreply(at)postgresql(dot)org>  
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org  
Cc:	lalbin(at)scharp(dot)org  
Subject:	BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack  
Date:	2018-04-30 20:41:11  
Message-ID:	[email protected]  
Views:	Raw Message | Whole Thread | Download mbox  
Thread:	  
Lists:	pgsql-bugs pgsql-hackers  
The following bug has been logged on the website:  
  
  
  
Bug reference:      15182  
Logged by:          Lloyd Albin  
Email address:      lalbin(at)scharp(dot)org  
PostgreSQL version: 10.3  
Operating system:   OpenSUSE  
Description:          
  
  
  
Over the last several weeks our developers caused a Denial of Service Attack  
against ourselves by accident. When looking at the log files, I noticed that  
we had authentication timeouts during these time periods. In researching the  
problem I found this is due to locks being held on shared system catalog  
items, aka system catalog items that are shared between all databases on the  
same cluster/server. This can be caused by beginning a long running  
transaction that queries pg_stat_activity, pg_roles, pg_database, etc and  
then another connection that runs either a REINDEX DATABASE, REINDEX SYSTEM,  
or VACUUM FULL. This issue is of particular importance to database resellers  
who use the same cluster/server for multiple clients, as two clients can  
cause this issue to happen inadvertently or a single client can either cause  
it to happen maliciously or inadvertently. Note: The large cloud providers  
give each of their clients their own cluster/server so this will not affect  
across cloud clients but can affect an individual client. The problem is  
that traditional hosting companies will have all clients from one or more  
web servers share the same PostgreSQL cluster/server. This means that one or  
two clients could inadvertently stop all the other clients from being able  
to connect to their databases until the first client does either a COMMIT or  
ROLLBACK of their transaction which they could hold open for hours, which is  
what happened to us internally.  
  
  
  
In Connection 1 we need to BEGIN a transaction and then query a shared  
system item; pg_authid, pg_database, etc; or a view that depends on a shared  
system item; pg_stat_activity, pg_roles, etc. Our developers were accessing  
pg_roles.  
  
  
  
Connection 1 (Any database, Any User)  
BEGIN;  
SELECT * FROM pg_stat_activity;  
  
  
  
Connection 2 (Any database will do as long as you are the database owner)  
REINDEX DATABASE postgres;  
  
  
  
Connection 3 (Any Database, Any User)  
psql -h sqltest-alt -d sandbox  
  
  
  
All future Connection 3's will hang for however long the transaction in  
Connection 1 runs. In our case this was hours and denied everybody else the  
ability to log into the server until Connection 1 was committed. psql will  
just hang for hours, even overnight in my testing, but our apps would get  
the "Canceling authentication due to timeout" after 1 minute.  
  
  
  
Connection 2 can also do any of these commands to also cause the same  
issue:  
REINDEX SYSTEM postgres;  
VACUUM FULL pg_authid;  
vacuumdb -f -h sqltest-alt -d lloyd -U lalbin  
  
  
  
Even worse is that the VACUUM FULL pg_authid; can be started by an  
unprivileged user and it will wait for the AccessShareLock by connection 1  
to be released before returning the error that you don't have permission to  
perform this action, so even an unprivileged user can cause this to happen.  
The privilege check needs to happen before the waiting for the  
AccessExclusiveLock happens.  
  
  
  
This bug report has been simplified and shorted drastically. To read the  
full information about this issue please see my blog post:  
http://lloyd.thealbins.com/Canceling%20authentication%20due%20to%20timeout  
  
  
  
Lloyd Albin  
Database Administrator  
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)  
Fred Hutchinson Cancer Research Center  

复现方法如上。

防范

1、对于连接占用DDOS攻击的防范(1,设置认证超时参数。2,不要在公网监听。3,设置网络层防火墙。)

2、对于锁攻击(通常是无意识攻击),建议在操作大锁的SQL前,加锁超时,或者语句超时(尽量减少等待时长)。 (lock_timeout, statement_timeout都可以)

参考

《PostgreSQL 锁等待监控 珍藏级SQL - 谁堵塞了谁》

《PostgreSQL 设置单条SQL的执行超时 - 防雪崩》

《如何防止数据库雪崩(泛洪 flood)》

https://paquier.xyz/postgresql-2/postgres-12-dos-prevention/

《PostgreSQL 连接攻击(类似DDoS)》

猜你喜欢

转载自yq.aliyun.com/articles/700362