Interviewer: After the sub-library sub-table, id primary key of how to deal with?

 

Interview questions

After the sub-library sub-table, how to deal with id primary key?

Interviewer psychological analysis

In fact, this is after the sub-library sub-table you will be a question and then have to face is how to generate id? Because then, if divided into multiple tables, each table are cumulative from the beginning of 1, it is certainly not ah, you need a globally unique id to support. So this is your real production environment issues must be considered.

Face questions analysis

Database increment id

This means that every time your system to get an id, is to insert a table in a library of a little business meaning of the data, and then get an id increment of a database. Then go down to get the id corresponding to the sub-sub-library table write go.

The advantage of this scheme is convenient and simple, and who will use; drawback is that a single library generation increment id, if the high concurrency, then there will be a bottleneck; if you simply want to improve the look, then and opened a service out of this service every times to get the maximum current id, and then increments id own several one-time return to a group of id, id and then modify the current maximum value to a value after the increment several id; but in any case are based on a single database.

Appropriate scene: you sub-library sub-table on two reasons, either the single database concurrency is too high, or else a single library data is too big; unless you concurrency is not high, but the amount of data is too large to sub-library sub-table expansion, you can use this program, because it may be complicated by the highest per second up to a few hundred, then left alone to generate a library and table auto-increment primary key.

UUID

Benefit is locally generated, based on the database do not come; a bad place is, UUID too long, poor performance as the primary key, the other does not have a UUID orderly, B + tree index can cause excessive at the time of writing random writes, frequently modify the tree structure, resulting in performance degradation.

Appropriate scene: If you are what you want to randomly generated file name, number and the like, you can use UUID, but can not be used as the primary key is the UUID.

UUID.randomUUID().toString().replace(“-”, “”) -> sfsdf23423rr234sfdaf

Gets the current system time

This is to get the current time, but the problem is that when high concurrency, such as one second concurrent thousands, there will be repeat of the situation, this is certainly not appropriate. Basic would not have considered.

Appropriate scene: If you use this program in general, is the current time with many other business fields spliced ​​together, as a id, if the business you think is acceptable, is also possible. You can level business field values ​​with the current time spliced ​​together to form a globally unique number.

snowflake algorithm

 

snowflake algorithm is revenue distributed twitter id  generation algorithm is the long type id a 64-bit, 1 bit is not used, bit 41 is used therein as a few milliseconds, with a working machine id 10 bit, bit 12 is used as serial number.

  • 1 bit: do not, so why then? Because the binary in the first bit is 1 if it is, then it is negative, but we generate id are positive, so the first bit is a 0 uniform.

  • 41 bit: it indicates that the timestamp milliseconds. 41 bit numbers can be represented by up to 2 ^ 41--1, i.e. 2 ^ may identify 41--1 millisecond values, expressed in terms of the adult, 69 years.

  • 10 bit: recording work machine id, represents the service can be deployed on up to 2 ^ 10 machines which, namely 1024 machine. But in the five bit 10 bit behalf room id, 5 Ge bit on behalf of the machine id. Means that up to 2 ^ 5 Representative room (room 32), and each may represent a room where the machine 2 ^ 5 (32 machine).

  • 12 bit: This is used to record different id produced in the same millisecond, the maximum 12 bit may represent a positive integer of 2 ^ 12--1 = 4096, which means that this number can be represented by 12 bit to distinguish one millisecond with 4096 in different id.

0 | 0001100 10100010 10111110 10001001 01011100 00 | 10001 | 1 1001 | 0000 00000000

public class IdWorker {

    private long workerId;
    private long datacenterId;
    private long sequence;

    public IdWorker(long workerId, long datacenterId, long sequence) {
        // sanity check for workerId
        // 这儿不就检查了一下,要求就是你传递进来的机房id和机器id不能超过32,不能小于0
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(
                    String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(
                    String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        System.out.printf(
                "worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);

        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;
    }

    private long twepoch = 1288834974657L;

    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;

    // 这个是二进制运算,就是 5 bit最多只能有31个数字,也就是说机器id最多只能是32以内
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);

    // 这个是一个意思,就是 5 bit最多只能有31个数字,机房id最多只能是32以内
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private long sequenceBits = 12L;

    private long workerIdShift = sequenceBits;
    private long datacenterIdShift = sequenceBits + workerIdBits;
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private long sequenceMask = -1L ^ (-1L << sequenceBits);

    private long lastTimestamp = -1L;

    public long getWorkerId() {
        return workerId;
    }

    public long getDatacenterId() {
        return datacenterId;
    }

    public long getTimestamp() {
        return System.currentTimeMillis();
    }

    public synchronized long nextId() {
        // 这儿就是获取当前时间戳,单位是毫秒
        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format(
                    "Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        if (lastTimestamp == timestamp) {
            // 这个意思是说一个毫秒内最多只能有4096个数字
            // 无论你传递多少进来,这个位运算保证始终就是在4096这个范围内,避免你自己传递个sequence超过了4096这个范围
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }

        // 这儿记录一下最近一次生成id的时间戳,单位是毫秒
        lastTimestamp = timestamp;

        // 这儿就是将时间戳左移,放到 41 bit那儿;
        // 将机房 id左移放到 5 bit那儿;
        // 将机器id左移放到5 bit那儿;将序号放最后12 bit;
        // 最后拼接起来成一个 64 bit的二进制数字,转换成 10 进制就是个 long 型
        return ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift) | sequence;
    }

    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    private long timeGen() {
        return System.currentTimeMillis();
    }

    // ---------------测试---------------
    public static void main(String[] args) {
        IdWorker worker = new IdWorker(1, 1, 1);
        for (int i = 0; i < 30; i++) {
            System.out.println(worker.nextId());
        }
    }

}

How do you say, about the meaning of it, that 41 bit is a current timestamp in milliseconds, it is this sense; then 5 bit is that you passed in a room id (but only within maximum 32), the other 5 bit your machine is passed in id (but only within maximum 32), the rest of the 12 bit serial number, that is, if you follow the previous generation are still within a millisecond id of time, then the order will give you accumulate, up to less than 4096 numbers.

So you take advantage of the tools that he engaged in a service, then for each machine in each room are initialized such a thing, the beginning of the serial number of the machine room is 0. Then each time it receives a request, saying that the machine room to generate a id, you will find the corresponding Worker generation.

Using this algorithm snowflake, you can develop your own company's services, even for a machine room and id id, anyway, you set aside 5 bit + 5 bit, you have a business meaning into other things are possible.

The snowflake algorithm is still relatively reliable, so you have to really do distributed  id generation, what if it is high concurrency, then this should be a relatively good performance, usually tens of thousands of concurrent scenes per second, enough with you a.

 

Further reading

Several common sub-library sub-table play and how to solve problems such as cross-database query

Key Problems and Solutions of the level of sub-library sub-table

Class hierarchy of exception handling Spring MVC

Experience Sharing of interview data structures, algorithms title

Vim command, operations, shortcut keys (Favorite Book)

 

Author: Yang Libin

Source: https: //github.com/doocs/advanced-java

Guess you like

Origin www.cnblogs.com/javafirst0/p/11225789.html