Use Redis achieve UA pool

premise

Recently busy with business development, transfer and games, plus hit the stage of hesitation and confusion from time to time arise, abandoned their studies for some time. Cold weather, to pick up the next phase of the study began. Prior exposure to some of the data search projects, involving a request simulation, based on anti-climb requires the use of random User Agent, so use Redisimplements a very simple UApool.

background

A recent demand, request logic simulation is required in the request header of each request User Agentto satisfy the following points:

  • Every acquisition User Agentis random.
  • Every acquisition User Agent(short) can not be repeated.
  • Every acquisition of User Agentthe operating system with information must mainstream (can be Uinux, Windows, IOSand Android, etc.).

Here are three points from UAsolved source of the data, in fact, we should focus on specific implementations. Simple analysis, process is as follows:

In the design of UAthe pool when its data structures and circular queue is very similar:

Figure above, assuming different colors UAare completely different UA, they are broken up by the shuffling algorithm put in the circular queue, the fact that one takes out UAafter just put the cursor cursorforward or backward to a cell (or even the cursor set to any element in the queue). The final realization is this: the need to achieve a distributed queue through the middleware (just queue, not the message queue).

Specific implementation

There is no doubt that the need for a distributed database to store the type of middleware has been prepared UA, the first impression you feel Rediswould be more appropriate. Next you need to choose Redisthe type of data, the main consideration several aspects:

  • It includes queue properties.
  • The best support random access.
  • Elements into the team, the team and random access time complexity is lower, after all, get UAtraffic interfaces will be relatively large.

Support these aspects of Redisthe data type is List, but attention Listitself can not go heavy, heavy work to be implemented in the code logic. Then imagine client obtains UAthe process is as follows:

Combine the previous analysis, the encoding process has the following steps:

  1. Ready to import UAdata can be read from the data source, you can also read files directly.
  2. Because of the need to import UAa collection of data in general will not be much, consider this first data set of randomly scattered, if Javadeveloped can be used directly Collections#shuffle()shuffling algorithm, of course, this algorithm can randomly distributed data itself, this step is for some analog side will strictly examine UAthe scene legitimacy is necessary .
  3. Import UAdata into the Redislist.
  4. Write RPOP + LPUSHthe Luascript, to achieve distributed circular queue.

Coding and testing examples

The introduction of Redisthe Advanced Client Lettucerelies:

<dependency>
    <groupId>io.lettuce</groupId>
    <artifactId>lettuce-core</artifactId>
    <version>5.2.1.RELEASE</version>
</dependency>

Write RPOP + LPUSHthe Luascript, Luathe script name tentatively called L_RPOP_LPUSH.lua, in resources/scripts/luathe directory:

local key = KEYS[1]
local value = redis.call('RPOP', key)
redis.call('LPUSH', key, value)
return value

This script is very simple, but has realized the function of the circular queue. The remaining test code as follows:

public class UaPoolTest {

    private static RedisCommands<String, String> COMMANDS;

    private static AtomicReference<String> LUA_SHA = new AtomicReference<>();
    private static final String KEY = "UA_POOL";

    @BeforeClass
    public static void beforeClass() throws Exception {
        // 初始化Redis客户端
        RedisURI uri = RedisURI.builder().withHost("localhost").withPort(6379).build();
        RedisClient redisClient = RedisClient.create(uri);
        StatefulRedisConnection<String, String> connect = redisClient.connect();
        COMMANDS = connect.sync();
        // 模拟构建UA池的原始数据,假设有10个UA,分别是UA-0 ... UA-9
        List<String> uaList = Lists.newArrayList();
        IntStream.range(0, 10).forEach(e -> uaList.add(String.format("UA-%d", e)));
        // 洗牌
        Collections.shuffle(uaList);
        // 加载Lua脚本
        ClassPathResource resource = new ClassPathResource("/scripts/lua/L_RPOP_LPUSH.lua");
        String content = StreamUtils.copyToString(resource.getInputStream(), StandardCharsets.UTF_8);
        String sha = COMMANDS.scriptLoad(content);
        LUA_SHA.compareAndSet(null, sha);
        // Redis队列中写入UA数据,数据量多的时候可以考虑分批写入防止长时间阻塞Redis服务
        COMMANDS.lpush(KEY, uaList.toArray(new String[0]));
    }

    @AfterClass
    public static void afterClass() throws Exception {
        COMMANDS.del(KEY);
    }

    @Test
    public void testUaPool() {
        IntStream.range(1, 21).forEach(e -> {
            String result = COMMANDS.evalsha(LUA_SHA.get(), ScriptOutputType.VALUE, KEY);
            System.out.println(String.format("第%d次获取到的UA是:%s", e, result));
        });
    }
}

A run of results are as follows:

第1次获取到的UA是:UA-0
第2次获取到的UA是:UA-8
第3次获取到的UA是:UA-2
第4次获取到的UA是:UA-4
第5次获取到的UA是:UA-7
第6次获取到的UA是:UA-5
第7次获取到的UA是:UA-1
第8次获取到的UA是:UA-3
第9次获取到的UA是:UA-6
第10次获取到的UA是:UA-9
第11次获取到的UA是:UA-0
第12次获取到的UA是:UA-8
第13次获取到的UA是:UA-2
第14次获取到的UA是:UA-4
第15次获取到的UA是:UA-7
第16次获取到的UA是:UA-5
第17次获取到的UA是:UA-1
第18次获取到的UA是:UA-3
第19次获取到的UA是:UA-6
第20次获取到的UA是:UA-9

Visible difference in the effect is not shuffling algorithm, the data relative to the dispersion.

summary

In fact, UAthe pool design difficulty is not great, we need to pay attention to several points:

  • System version generally mainstream mobile device or desktop is not too much, so the source UAdata is not much, the simplest implementations can use file storage, direct write once read Redisin.
  • Attention to the need to break up random UAdata to avoid the same type of equipment system UAdata too dense to avoid triggering simulate certain risk control rule request time.
  • You need to be familiar Luasyntax, after all, Redisthe atom must take a command Luascript.

(End herein, c-2-d ea-20191114)

Description link

  • Github Page:http://throwable.club/2019/11/14/redis-in-action-ua-pool/
  • Coding Page:http://throwable.coding.me/2019/11/14/redis-in-action-ua-pool/

Guess you like

Origin www.cnblogs.com/throwable/p/11955162.html