Article Directory
-
- 1. Backgrounds
- 2. Demand
- 3. Implementation idea
- 4. Introduction to SCAN
- 5. Use Jedis to implement Key scanning
-
- 5.1 Basic helper classes
- 5.2 Batch processing stop switch
- 5.3 Scan result callback function
- 5.4 Realize the Key scan of a single Redis node
- 5.5 Realize the scanning of Redis cluster
- 5.6 Create a sample code of JedisCluster
- 5.7 Parse the IP and port number corresponding to Jedis
- 5.8 Scan result callback example
- 5.9 Test code
- 6. Lettuce implementation code
- 7. Brief summary
- reference link
1. Backgrounds
The company has an outsourcing project, and the data storage uses Redis cluster, and the data volume is between 200GB and 300GB.
2. Demand
Due to cost considerations, it is necessary to clean up a part of the data that meets certain conditions.
3. Implementation idea
Since it is a Redis cluster, it can be roughly divided into the following steps:
-
- According to Redis node information, get all Redis nodes in this cluster.
-
- For each node, execute
scan 100 match ***
the command.
- For each node, execute
-
- Get the key and execute the callback method/specific business logic.
-
- Iterate over the next Redis node.
4. Introduction to SCAN
SCAN
It is a command supported by redis, which can be used to:
- Pagination queries the current Redis node, corresponding to all Keys in the database.
- Paging queries for keys that match a specific pattern.
The time complexity is O(N)
, and the main usage scenario is to traverse all keys.
Another similar command is keys
, but keys
the command has serious performance problems, and in some environments administrators will directly disable this command.
SCAN
The syntax of the command is:
SCAN cursor [MATCH pattern] [COUNT count] [TYPE type]
in:
cursor
Indicates the cursor serial number, the default is0
; each query will return the cursor serial number that needs to be used on the next page;MATCH pattern
Is a simple pattern matching, using wildcards*
to represent any character;COUNT count
Specify the maximum number returned by each query, similar to the pageSize of the paging query, the default value10
;TYPE type
It is used to filter the specific data type, only the Key that meets the type will be returned, and the specific type of a Key can be detected by using TYPE xxxKey . Common values include:string
,list
,set
,zset
,hash
andstream
.
Example:
scan 0
scan 0 MATCH cnc:*
scan 0 MATCH *cnc:* COUNT 100
scan 0 MATCH *cnc:* COUNT 100 TYPE zset
In addition, SCAN also has some variant commands, which are used to traverse the Keys in the corresponding collection under a certain main Key.
If it is a very large collection, such as hundreds of thousands of elements, without splitting, sometimes these variant commands can be used.
Sscan
Used to page throughset
the sub-KEYs in the type collection;Hscan
Used to page throughhash
the sub-KEYs in the type collection;Zscan
Used to page throughzset
the sub-KEYs in the type collection;
As for the list data type, there is no need for a special paging traversal command, because it is very convenient to perform pagination with lindex
and .lrange
Example:
hscan xxxHashKey 0
5. Use Jedis to implement Key scanning
Since the Jedis dependency library is used in our project, it is directly based on this library.
The dependency library of jedis can be searched on the mvnrepository.com website. The website has enabled the anti-swipe check at present. If there is a problem displayed, just refresh the page.
For example:
<!-- https://mvnrepository.com/artifact/redis.clients/jedis -->
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>4.4.2</version>
</dependency>
The method provided by Jedis is similar to the Redis command, and it is easy to use.
If the method is JedisCluster
called directly scan
, an error message will be prompted:
Error: Cluster mode only supports SCAN command
with MATCH pattern containing hash-tag
( curly-brackets enclosed string )
For error codes and details, please refer to: Scan a Redis Cluster
So we need to use the idea we mentioned earlier: traverse the Redis nodes one by one and perform a scan.
5.1 Basic helper classes
Let's first create a basic tool class to simply organize and encapsulate related logic to avoid code clutter.
# 相关依赖附在此处
import com.alibaba.fastjson.JSON;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
import org.springframework.boot.autoconfigure.data.redis.RedisProperties;
import redis.clients.jedis.*;
import java.lang.reflect.Field;
import java.util.*;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicLong;
// Redis的Key扫描辅助工具类
public class RedisKeyScanHelper {
// ...
}
5.2 Batch processing stop switch
Since Key scanning is a time-consuming batch task, it is more elegant to have a control switch if it needs to intervene or terminate midway. Otherwise, it can only be violently shut down and restarted.
// 增加一个开关, 控制是否停止
public static AtomicBoolean stopFlag = new AtomicBoolean(false);
The code that executes the scan traversal, before each batch is executed, can judge the switch status, determine to exit or throw an exception.
Sample code:
if (stopFlag.get()) {
return; // 退出;
// 可以考虑抛出业务异常:
// throw new RuntimeException("收到停止信号");
}
5.3 Scan result callback function
Since we encapsulate the tool class, using the callback method is a more convenient design method.
The advantage is to strip out the subsequent business processing logic and avoid code coupling.
First define an interface interface.
// 扫描结果回调函数
public interface ScanResultCallBack {
// 根据需要, 也可以定制对应的方法参数;
public void process(String key, Jedis jedis);
}
Since we want to count the data volume of each node when we scan the Key, there are 2 parameters of the callback method:
String key
It is the scanned key;Jedis jedis
is the Redis node where the corresponding Key is located;
As you can see from this code, the callback class is called only when the corresponding Key is scanned.
If you need to call back at other times, such as:
- Scan to Redis node
- Connect to Redis successfully
- cluster information
- Key traversal
- Scan to a specific Key type
- Scanned for a certain type of value of a certain type
- when something unusual happens
For operations such as this, you can customize the specific logic as needed.
5.4 Realize the Key scan of a single Redis node
Jedis
The class encapsulates scan
the command, we can use it directly.
The corresponding method is:
// 在Redis节点内部遍历和扫描
public static void scanRedisNode(Jedis jedis, ScanResultCallBack callBack) {
// 每次扫描的数量
final Integer pageSize = 100;
ScanParams scanParams = new ScanParams()
//.match("*")
.count(pageSize);
// 游标: 直接使用 ScanParams 的常量
String cursor = ScanParams.SCAN_POINTER_START;
do {
if (stopFlag.get()) {
return; // 退出;
// 可以考虑抛出业务异常:
// throw new RuntimeException("收到停止信号");
}
// 执行扫描
ScanResult<String> scanResult = jedis.scan(cursor, scanParams);
// 获取对应的key-list
List<String> keys = scanResult.getResult();
for (String key : keys) {
// 判空
if (Objects.isNull(key)) {
continue;
}
if (Objects.isNull(callBack)) {
continue;
}
// 执行回调
try {
callBack.process(key, jedis);
} catch (Exception e) {
String message = ("执行回调异常: key=: " + key + ";" + e.getMessage());
System.out.println(message);
// 根据需要选择是否抛出异常; 或者打印堆栈
// throw new RuntimeException(message, e);
}
}
//
// 设置下一次扫描的游标
cursor = scanResult.getCursor();
// 只要返回的游标不是起始值0, 就继续执行下一次循环
} while (!cursor.equals(ScanParams.SCAN_POINTER_START));
}
Because we use static
to declare the static method, the callback function is passed in as a method parameter.
If it is a normal method, you can set the callback field by means of dependency injection or the like. such as commonly used
5.5 Realize the scanning of Redis cluster
In the Redis cluster, there are master nodes ( master
) and slave nodes ( slave
), so we need to judge the role of the Redis node.
// 判断 Redis 节点的角色
public static String role(Jedis jedis) {
try {
// info replication
String replicationInfo = jedis.info("replication");
// 其实可以按行解析;
// 这里简单粗暴直接判断
if (replicationInfo.contains("role:master")) {
// 主节点
return "master";
}
if (replicationInfo.contains("role:slave")) {
// 从节点
return "slave";
}
} catch (Exception ignore) {
}
return "";
}
JedisCluster
It is the encapsulation of the cluster by Jedis, and there are many construction methods, which can be selected according to the needs during development.
// 扫描 Redis 集群的Key
public static void scanRedisCluster(JedisCluster cluster, ScanResultCallBack callBack) {
// 获取集群的所有节点
Map<String, JedisPool> clusterNodes = cluster.getClusterNodes();
Set<String> keySet = clusterNodes.keySet();
for (String nodeKey : keySet) {
JedisPool jedisPool = clusterNodes.get(nodeKey);
Jedis jedis = jedisPool.getResource();
System.out.println("scanRedisCluster: 探测到Redis节点: " + hostPort(jedis));
}
// 遍历Redis节点
for (String nodeKey : keySet) {
if (stopFlag.get()) {
return; // 退出;
// 可以考虑抛出业务异常:
// throw new RuntimeException("收到停止信号");
}
JedisPool jedisPool = clusterNodes.get(nodeKey);
Jedis jedis = jedisPool.getResource();
// 判断节点角色
String role = role(jedis);
if (!"master".equals(role)) {
System.out.println("scanRedisCluster: 忽略从节点: " + hostPort(jedis) + "; role=" + role);
continue;
} else {
System.out.println("scanRedisCluster: 开始扫描主节点: " + hostPort(jedis) + "; role=" + role);
}
// 扫描该节点
try {
scanRedisNode(jedis, callBack);
} catch (Exception e) {
String message = ("扫描节点异常: jedis=: " + jedis + ";" + e.getMessage());
System.out.println(message);
// 根据需要选择是否抛出异常; 或者打印堆栈
// throw new RuntimeException(message, e);
}
}
}
The implementation logic is not complicated, and System.out.println
part of the code can decide whether to use logger
the output according to the project.
Generally speaking, as a batch process, it is necessary to handle all kinds of unexpected situations in a compatible manner, and at the same time, it should retain a certain notification ability to notify the outside of the abnormal situation.
5.6 Create a sample code of JedisCluster
JedisCluster
It is the encapsulation of the cluster by Jedis. There are many construction methods, which can be selected according to the specific situation during development.
Here are two sample codes for creating JedisCluster.
// 创建Jedis集群
public static JedisCluster createJedisCluster(RedisProperties properties) {
Set<HostAndPort> jedisClusterNode = new HashSet<>();
//
String clientName = properties.getClientName();
String password = properties.getPassword();
//
RedisProperties.Cluster clusterProperties = properties.getCluster();
List<String> nodeStrList = clusterProperties.getNodes();
//
// System.out.println("createJedisCluster: nodeStrList=" + JSON.toJSON(nodeStrList));
//
for (String str : nodeStrList) {
if (StringUtils.isEmpty(str)) {
continue;
}
String host = str;
int port = 6379;
if (str.contains(":")) {
String[] arrays = str.split(":");
host = arrays[0];
port = Integer.parseInt(arrays[1]);
}
// 其实只要有1个可连接的节点就行;
HostAndPort hostAndPort = new HostAndPort(host, port);
jedisClusterNode.add(hostAndPort);
}
return createJedisCluster(jedisClusterNode, password, clientName);
}
// 创建Jedis集群
public static JedisCluster createJedisCluster(Set<HostAndPort> nodes, String password, String clientName) {
//
int DEFAULT_MAX_ATTEMPTS = 5;
int DEFAULT_TIMEOUT = 2000;
//
int connectionTimeout = DEFAULT_TIMEOUT;
int soTimeout = DEFAULT_TIMEOUT;
int maxAttempts = DEFAULT_MAX_ATTEMPTS;
GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
JedisCluster jedisCluster = new JedisCluster(nodes, connectionTimeout,
soTimeout, maxAttempts, password, clientName, poolConfig);
return jedisCluster;
// public JedisCluster(Set<HostAndPort> jedisClusterNode, int connectionTimeout, int soTimeout,
// int maxAttempts, String password, String clientName, final GenericObjectPoolConfig poolConfig) {
// super(jedisClusterNode, connectionTimeout, soTimeout, maxAttempts, password, clientName, poolConfig);
// }
}
The logic is not complicated, the key is which constructor to use.
In specific development, according to the existing parameters and the situation of the Redis cluster, select the corresponding constructor according to the input parameters.
5.7 Parse the IP and port number corresponding to Jedis
Because the business logic needs it, create a parsing method:
// 反射获取Redis节点的IP和端口号
public static HostAndPort hostPort(Jedis jedis) {
if (Objects.isNull(jedis)) {
return null;
}
// 获取private 属性时, 需要使用直接定义该字段的类;
// 当然, 也可以遍历迭代所有超类和接口来查找。
Class<BinaryJedis> clazzJedis = BinaryJedis.class;
Class<Connection> clazzClient = Connection.class;
try {
// 获取字段
Field clientField = clazzJedis.getDeclaredField("client");
Field jedisSocketFactoryField = clazzClient.getDeclaredField("jedisSocketFactory");
// 临时设置这个字段包装允许访问/读取
clientField.setAccessible(true);
jedisSocketFactoryField.setAccessible(true);
// 反射获取对应的属性
Client client = (Client) clientField.get(jedis);
JedisSocketFactory jedisSocketFactory =
(JedisSocketFactory) jedisSocketFactoryField.get(client);
// 拼装 HostAndPort
String host = jedisSocketFactory.getHost();
int port = jedisSocketFactory.getPort();
HostAndPort hostAndPort = new HostAndPort(host, port);
// HostAndPort 实现了 toString() 方法, 使用很方便.
return hostAndPort;
} catch (Exception e) {
String message = ("解析Jedis的HostAndPort出错; errorMsg: " + e.getMessage());
System.out.println(message);
return null;
}
}
Other attributes are required, and similar methods can also be used to obtain them.
5.8 Scan result callback example
A simple callback example is provided here, and the specific code can be rewritten according to requirements.
// 扫描结果回调逻辑实现
public static class ScanResultCallBackImpl implements ScanResultCallBack {
// 缓存Jedis与IP端口的映射关系
private Map<Jedis, String> hostMap = new HashMap<>();
// IP端口与Key数量的简单统计
private Map<String, AtomicLong> countMap = new HashMap<>();
// 解析Jedis的IP端口并缓存
private String parseHostPort(Jedis jedis) {
if (hostMap.containsKey(jedis)) {
return hostMap.get(jedis);
}
HostAndPort hostAndPort = RedisKeyScanHelper.hostPort(jedis);
if (Objects.nonNull(hostAndPort)) {
hostMap.put(jedis, hostAndPort.toString());
return hostMap.get(jedis);
}
return "UNKNOWN";
}
// 自增统计
private long incrementCount(String hostAndPort) {
AtomicLong counter = countMap.getOrDefault(hostAndPort, new AtomicLong());
long curCount = counter.incrementAndGet();
countMap.put(hostAndPort, counter);
return curCount;
}
// 回调入口
@Override
public void process(String key, Jedis jedis) {
String hostAndPort = parseHostPort(jedis);
// System.out.println("扫描到Key:" + key + "; 所在节点:" + hostAndPort);
// 计数
long curCount = incrementCount(hostAndPort);
// 采样
if (curCount % 10000L == 1L) {
String type = jedis.type(key);
// 一般取值包括: `string`, `list`, `set`, `zset`, `hash` 以及 `stream`
if ("string".equals(type)) {
String value = jedis.get(key);
System.out.println("==回调采样" + curCount + "; 扫描到Key=" + key +
"; value=" + value + "; 所在节点: " + hostAndPort);
} else {
System.out.println("==回调采样" + curCount + "; 扫描到Key:" + key +
"; type=" + type + "; 所在节点: " + hostAndPort);
}
}
// 判断Key满足某种标准;
if (key.startsWith("cnc:")) {
// doSomething
// 比如满足某种特征的Key,
// 或者满足某种特征的VALUE, 执行某些操作
}
}
// 这里通过 toString() 暴露一些信息
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("ClusterHostList: " + JSON.toJSON(hostMap.values())).append("\n");
builder.append("countMap: " + JSON.toJSON(countMap)).append("\n");
return builder.toString();
}
}
toString()
Overriding is an easy way to do this if you don't want to implement other methods .
5.9 Test code
Let's write a simple main method to test:
public static void main(String[] args) {
// 只需要传入1个节点的IP和端口即可
Set<HostAndPort> jedisClusterNode = new HashSet<>();
jedisClusterNode.add(new HostAndPort("cluster-1.cnc.com", 7000));
// 密码信息, 没有传 null
String password = "Your_Password";
String clientName = "RedisKeyScanner";
// 创建Jedis集群
JedisCluster jedisCluster = createJedisCluster(jedisClusterNode, password, clientName);
// 回调
ScanResultCallBack callBack = new ScanResultCallBackImpl();
// 开始扫描
scanRedisCluster(jedisCluster, callBack);
// 汇总结果信息
System.out.println("结果汇总:" + callBack.toString());
}
When testing, you need to prepare the Redis cluster first, and configure it according to the cluster information.
6. Lettuce implementation code
TODO is preparing to study Lettuce. If readers have a reference implementation, welcome to exchange and discuss.
7. Brief summary
The steps to traverse all keys in the Redis cluster are not complicated:
- Connect to Redis cluster
- query all nodes
- Judgment node role
- Scan all keys of Redis nodes
- Callbacks and Statistics
reference link
- Redis official documentation: SCAN command
- Scan a Redis Cluster
- List All Available Redis Keys
- Iron anchor's CSDN blog
Author: Iron Anchor
Date: June 14, 2023