premise
Recently busy with business development, transfer and games, plus hit the stage of hesitation and confusion from time to time arise, abandoned their studies for some time. Cold weather, to pick up the next phase of the study began. Prior exposure to some of the data search projects, involving a request simulation, based on anti-climb requires the use of random User Agent
, so use Redis
implements a very simple UA
pool.
background
A recent demand, request logic simulation is required in the request header of each request User Agent
to satisfy the following points:
- Every acquisition
User Agent
is random. - Every acquisition
User Agent
(short) can not be repeated. - Every acquisition of
User Agent
the operating system with information must mainstream (can beUinux
,Windows
,IOS
and Android, etc.).
Here are three points from UA
solved source of the data, in fact, we should focus on specific implementations. Simple analysis, process is as follows:
In the design of UA
the pool when its data structures and circular queue is very similar:
Figure above, assuming different colors UA
are completely different UA
, they are broken up by the shuffling algorithm put in the circular queue, the fact that one takes out UA
after just put the cursor cursor
forward or backward to a cell (or even the cursor set to any element in the queue). The final realization is this: the need to achieve a distributed queue through the middleware (just queue, not the message queue).
Specific implementation
There is no doubt that the need for a distributed database to store the type of middleware has been prepared UA
, the first impression you feel Redis
would be more appropriate. Next you need to choose Redis
the type of data, the main consideration several aspects:
- It includes queue properties.
- The best support random access.
- Elements into the team, the team and random access time complexity is lower, after all, get
UA
traffic interfaces will be relatively large.
Support these aspects of Redis
the data type is List
, but attention List
itself can not go heavy, heavy work to be implemented in the code logic. Then imagine client obtains UA
the process is as follows:
Combine the previous analysis, the encoding process has the following steps:
- Ready to import
UA
data can be read from the data source, you can also read files directly. - Because of the need to import
UA
a collection of data in general will not be much, consider this first data set of randomly scattered, ifJava
developed can be used directlyCollections#shuffle()
shuffling algorithm, of course, this algorithm can randomly distributed data itself, this step is for some analog side will strictly examineUA
the scene legitimacy is necessary . - Import
UA
data into theRedis
list. - Write
RPOP + LPUSH
theLua
script, to achieve distributed circular queue.
Coding and testing examples
The introduction of Redis
the Advanced Client Lettuce
relies:
<dependency>
<groupId>io.lettuce</groupId>
<artifactId>lettuce-core</artifactId>
<version>5.2.1.RELEASE</version>
</dependency>
Write RPOP + LPUSH
the Lua
script, Lua
the script name tentatively called L_RPOP_LPUSH.lua
, in resources/scripts/lua
the directory:
local key = KEYS[1]
local value = redis.call('RPOP', key)
redis.call('LPUSH', key, value)
return value
This script is very simple, but has realized the function of the circular queue. The remaining test code as follows:
public class UaPoolTest {
private static RedisCommands<String, String> COMMANDS;
private static AtomicReference<String> LUA_SHA = new AtomicReference<>();
private static final String KEY = "UA_POOL";
@BeforeClass
public static void beforeClass() throws Exception {
// 初始化Redis客户端
RedisURI uri = RedisURI.builder().withHost("localhost").withPort(6379).build();
RedisClient redisClient = RedisClient.create(uri);
StatefulRedisConnection<String, String> connect = redisClient.connect();
COMMANDS = connect.sync();
// 模拟构建UA池的原始数据,假设有10个UA,分别是UA-0 ... UA-9
List<String> uaList = Lists.newArrayList();
IntStream.range(0, 10).forEach(e -> uaList.add(String.format("UA-%d", e)));
// 洗牌
Collections.shuffle(uaList);
// 加载Lua脚本
ClassPathResource resource = new ClassPathResource("/scripts/lua/L_RPOP_LPUSH.lua");
String content = StreamUtils.copyToString(resource.getInputStream(), StandardCharsets.UTF_8);
String sha = COMMANDS.scriptLoad(content);
LUA_SHA.compareAndSet(null, sha);
// Redis队列中写入UA数据,数据量多的时候可以考虑分批写入防止长时间阻塞Redis服务
COMMANDS.lpush(KEY, uaList.toArray(new String[0]));
}
@AfterClass
public static void afterClass() throws Exception {
COMMANDS.del(KEY);
}
@Test
public void testUaPool() {
IntStream.range(1, 21).forEach(e -> {
String result = COMMANDS.evalsha(LUA_SHA.get(), ScriptOutputType.VALUE, KEY);
System.out.println(String.format("第%d次获取到的UA是:%s", e, result));
});
}
}
A run of results are as follows:
第1次获取到的UA是:UA-0
第2次获取到的UA是:UA-8
第3次获取到的UA是:UA-2
第4次获取到的UA是:UA-4
第5次获取到的UA是:UA-7
第6次获取到的UA是:UA-5
第7次获取到的UA是:UA-1
第8次获取到的UA是:UA-3
第9次获取到的UA是:UA-6
第10次获取到的UA是:UA-9
第11次获取到的UA是:UA-0
第12次获取到的UA是:UA-8
第13次获取到的UA是:UA-2
第14次获取到的UA是:UA-4
第15次获取到的UA是:UA-7
第16次获取到的UA是:UA-5
第17次获取到的UA是:UA-1
第18次获取到的UA是:UA-3
第19次获取到的UA是:UA-6
第20次获取到的UA是:UA-9
Visible difference in the effect is not shuffling algorithm, the data relative to the dispersion.
summary
In fact, UA
the pool design difficulty is not great, we need to pay attention to several points:
- System version generally mainstream mobile device or desktop is not too much, so the source
UA
data is not much, the simplest implementations can use file storage, direct write once readRedis
in. - Attention to the need to break up random
UA
data to avoid the same type of equipment systemUA
data too dense to avoid triggering simulate certain risk control rule request time. - You need to be familiar
Lua
syntax, after all,Redis
the atom must take a commandLua
script.
(End herein, c-2-d ea-20191114)
Description link
- Github Page:http://throwable.club/2019/11/14/redis-in-action-ua-pool/
- Coding Page:http://throwable.coding.me/2019/11/14/redis-in-action-ua-pool/