After distributed in sub-library sub-table, how to deal with the primary key ID?

Interview questions

After the sub-library sub-table, how to deal with id primary key? (Uniqueness, sorting, etc.)

Interviewer psychological analysis

In fact, this is a sub-library sub-table problem after you will have to face, is the id ye generation? Because then, if divided into multiple tables, each table are cumulative from the beginning of 1, it is certainly not ah, you need a globally unique id of support, sorting issues. So this is your real production environment issues must be considered.

Face questions analysis

Based on implementation of the database

Database increment id

This means that every time your system to get an id, is to insert a table in a library of a little business meaning of the data, and then get an id increment of a database. Then go down to get the id corresponding to the sub-sub-library table write go.

The advantage of this scheme is convenient and simple, and who will use; drawback is that a single library generation increment id, if the high concurrency, then there will be a bottleneck; if you simply want to improve the look, then and opened a service out of this service every times to get the maximum current id, and then increments id own several one-time return to a group of id, id and then modify the current maximum value to a value after the increment several id; but in any case are based on a single database .

Appropriate scene : you sub-library sub-table on two reasons, either the single database concurrency is too high, or else a single library data is too big; unless you concurrency is not high, but the amount of data is too large due to sub-library sub-table expansion, you can use this program, because it may be complicated by the highest per second up to a few hundred, then left alone to generate a library and table auto-increment primary key.

Table or database sequence set increment step Field

Horizontal scaling may be provided by a database sequence table or self-energizing field step.

For example, there are eight service nodes, each service node using a function to generate a sequence ID, the ID of each different starting sequence, and successively increment step size is 8.

file

Suitable scenario : when a user ID is prevented repeated, this solution is relatively simple to implement, can achieve performance objectives. But the service node fixed step size is also fixed, but also to increase in the future if the service node, bad did.

UUID

Benefit is locally generated, based on the database do not come; a bad place is, UUID too long, large space, as the primary key poor performance of; more importantly, UUID does not have orderly, can lead to B + tree index in the writing of the excessive random writes (continuous ID generating portion can be sequential write), and, because it is not generating sequential write append operation, when the insert operation is required, will read the entire B + tree nodes into memory, this record is inserted after the entire node will be written back to disk, this operation in the recording space is relatively large, significant performance degradation.

Appropriate scene: If you are what you want to randomly generated file name, number and the like, you can use UUID, but can not be used as the primary key is the UUID.

UUID.randomUUID().toString().replace(“-”, “”) -> sfsdf23423rr234sfdaf

Gets the current system time

This is to get the current time, but the problem is that high concurrent time , such as one second concurrent thousands, there will be repeat of the situation , this is certainly not appropriate. Basic would not have considered.

Appropriate scene: If you use this program in general, is the current time with many other business fields spliced ​​together, as a id, if the business you think is acceptable, is also possible. You can level business field values ​​with the current time spliced ​​together to form a globally unique number.

snowflake algorithm

snowflake algorithm is revenue distributed twitter id generation algorithm using Scala language, the long type id is a 64-bit, 1 bit is not used therein + 41 is used as the bit number of milliseconds + a working machine with a 10 bit id + 12 bit as a sequence number.

  • 1 bit: do not, so why then? Because the binary in the first bit is 1 if it is, then it is negative, but we generate id are positive, so the first bit is a 0 uniform.
  • 41 bit: it indicates that the timestamp milliseconds. 41 bit number can represent up to 2^41 - 1, that is, you can identify 2^41 - 1a millisecond value, expressed in terms of an adult is 69 years.
  • 10 bit: recording work machine id, represents the service can be deployed on up to 2 ^ 10 machines which, namely 1024 machine. But in the five bit 10 bit behalf room id, 5 Ge bit on behalf of the machine id. Meaning that most representatives of 2^5a room (32 rooms), in each room can represent 2^5a machine (32 machines).
  • 12 bit: the maximum positive integer that is used to record the same milliseconds different id generated, 12 bit may represent 2^12 - 1 = 4096, i.e. this number can be represented by 12 bit to distinguish a single millisecond 4096 of different id .
0 | 0001100 10100010 10111110 10001001 01011100 00 | 10001 | 1 1001 | 0000 00000000
public class IdWorker {

    private long workerId;
    private long datacenterId;
    private long sequence;

    public IdWorker(long workerId, long datacenterId, long sequence) {
        // sanity check for workerId
        // 这儿不就检查了一下,要求就是你传递进来的机房id和机器id不能超过32,不能小于0
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(
                    String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(
                    String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        System.out.printf(
                "worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);

        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;
    }

    private long twepoch = 1288834974657L;

    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;

    // 这个是二进制运算,就是 5 bit最多只能有31个数字,也就是说机器id最多只能是32以内
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);

    // 这个是一个意思,就是 5 bit最多只能有31个数字,机房id最多只能是32以内
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private long sequenceBits = 12L;

    private long workerIdShift = sequenceBits;
    private long datacenterIdShift = sequenceBits + workerIdBits;
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private long sequenceMask = -1L ^ (-1L << sequenceBits);

    private long lastTimestamp = -1L;

    public long getWorkerId() {
        return workerId;
    }

    public long getDatacenterId() {
        return datacenterId;
    }

    public long getTimestamp() {
        return System.currentTimeMillis();
    }

    public synchronized long nextId() {
        // 这儿就是获取当前时间戳,单位是毫秒
        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format(
                    "Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        if (lastTimestamp == timestamp) {
            // 这个意思是说一个毫秒内最多只能有4096个数字
            // 无论你传递多少进来,这个位运算保证始终就是在4096这个范围内,避免你自己传递个sequence超过了4096这个范围
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }

        // 这儿记录一下最近一次生成id的时间戳,单位是毫秒
        lastTimestamp = timestamp;

        // 这儿就是将时间戳左移,放到 41 bit那儿;
        // 将机房 id左移放到 5 bit那儿;
        // 将机器id左移放到5 bit那儿;将序号放最后12 bit;
        // 最后拼接起来成一个 64 bit的二进制数字,转换成 10 进制就是个 long 型
        return ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift) | sequence;
    }

    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    private long timeGen() {
        return System.currentTimeMillis();
    }

    // ---------------测试---------------
    public static void main(String[] args) {
        IdWorker worker = new IdWorker(1, 1, 1);
        for (int i = 0; i < 30; i++) {
            System.out.println(worker.nextId());
        }
    }

}

How do you say, about the meaning of it, that 41 bit is a current timestamp in milliseconds, it is this sense; then 5 bit is that you passed in a room id (but only within maximum 32), the other 5 bit you passed in machine id (but only within maximum 32), the rest of the 12 bit serial number, that is, if you follow the previous generation are still within a millisecond id of time, then the order will give you accumulate, up to less than 4096 numbers.

So you take advantage of the tools that he engaged in a service, then for each machine in each room are initialized such a thing, the beginning of the serial number of the machine room is 0. Then each time it receives a request, saying that the machine room to generate a id, you will find the corresponding Worker generation.

Using this algorithm snowflake, you can develop your own company's services, even for a machine room and id id, anyway, you set aside 5 bit + 5 bit, you have a business meaning into other things are possible.

Of course, you can also use: one bit is unused + by where 41 is bit a milliseconds + 12 is bit as a sequence number + with 10 bit as the working machine id, or reversed exchange about the order, how to use your own business It needs to be equipped with the combination.

The snowflake algorithm is still relatively tricky, so you have to really do id distributed generation, what if it is high concurrency, then this should be a relatively good performance, usually tens of thousands of concurrent scenes per second, enough with you a.

Based on the number of public-meter pocket link:
https://mp.weixin.qq.com/s/mt8bVpM57SsI-nvTRKxSKg

 

 

Source: https://www.cnblogs.com/midoujava/p/11610492.html

=========================================================

If only guaranteed to be unique, to write their own rules and an algorithm to generate a number based on the rules of the individual modules themselves;

For example, a simple example written before, xml file used to generate the sequence number as the configuration of the configuration file of fixed format, the format specified character string generation program according to an algorithm. Similar date formatting functions.

But often they need a lot of time to multiple data aggregation, polymerization, etc., as well as sorting, etc., we need to be sorted according to the time records generated, but also consider the accuracy: Sometimes, minutes, seconds, centiseconds, milliseconds, and so delicate when multiple servers synchronized to the time when there is need to rely on network latency and so on.

All, and to be able to withstand its own fault tolerance is determined according to your business needs

Interview questions

After the sub-library sub-table, how to deal with id primary key? (Uniqueness, sorting, etc.)

Interviewer psychological analysis

In fact, this is a sub-library sub-table problem after you will have to face, is the id ye generation? Because then, if divided into multiple tables, each table are cumulative from the beginning of 1, it is certainly not ah, you need a globally unique id of support, sorting issues. So this is your real production environment issues must be considered.

Face questions analysis

Based on implementation of the database

Database increment id

This means that every time your system to get an id, is to insert a table in a library of a little business meaning of the data, and then get an id increment of a database. Then go down to get the id corresponding to the sub-sub-library table write go.

The advantage of this scheme is convenient and simple, and who will use; drawback is that a single library generation increment id, if the high concurrency, then there will be a bottleneck; if you simply want to improve the look, then and opened a service out of this service every times to get the maximum current id, and then increments id own several one-time return to a group of id, id and then modify the current maximum value to a value after the increment several id; but in any case are based on a single database .

Appropriate scene : you sub-library sub-table on two reasons, either the single database concurrency is too high, or else a single library data is too big; unless you concurrency is not high, but the amount of data is too large due to sub-library sub-table expansion, you can use this program, because it may be complicated by the highest per second up to a few hundred, then left alone to generate a library and table auto-increment primary key.

Table or database sequence set increment step Field

Horizontal scaling may be provided by a database sequence table or self-energizing field step.

For example, there are eight service nodes, each service node using a function to generate a sequence ID, the ID of each different starting sequence, and successively increment step size is 8.

file

Suitable scenario : when a user ID is prevented repeated, this solution is relatively simple to implement, can achieve performance objectives. But the service node fixed step size is also fixed, but also to increase in the future if the service node, bad did.

UUID

Benefit is locally generated, based on the database do not come; a bad place is, UUID too long, large space, as the primary key poor performance of; more importantly, UUID does not have orderly, can lead to B + tree index in the writing of the excessive random writes (continuous ID generating portion can be sequential write), and, because it is not generating sequential write append operation, when the insert operation is required, will read the entire B + tree nodes into memory, this record is inserted after the entire node will be written back to disk, this operation in the recording space is relatively large, significant performance degradation.

Appropriate scene: If you are what you want to randomly generated file name, number and the like, you can use UUID, but can not be used as the primary key is the UUID.

UUID.randomUUID().toString().replace(“-”, “”) -> sfsdf23423rr234sfdaf

Gets the current system time

This is to get the current time, but the problem is that high concurrent time , such as one second concurrent thousands, there will be repeat of the situation , this is certainly not appropriate. Basic would not have considered.

Appropriate scene: If you use this program in general, is the current time with many other business fields spliced ​​together, as a id, if the business you think is acceptable, is also possible. You can level business field values ​​with the current time spliced ​​together to form a globally unique number.

snowflake algorithm

snowflake algorithm is revenue distributed twitter id generation algorithm using Scala language, the long type id is a 64-bit, 1 bit is not used therein + 41 is used as the bit number of milliseconds + a working machine with a 10 bit id + 12 bit as a sequence number.

  • 1 bit: do not, so why then? Because the binary in the first bit is 1 if it is, then it is negative, but we generate id are positive, so the first bit is a 0 uniform.
  • 41 bit: it indicates that the timestamp milliseconds. 41 bit number can represent up to 2^41 - 1, that is, you can identify 2^41 - 1a millisecond value, expressed in terms of an adult is 69 years.
  • 10 bit: recording work machine id, represents the service can be deployed on up to 2 ^ 10 machines which, namely 1024 machine. But in the five bit 10 bit behalf room id, 5 Ge bit on behalf of the machine id. Meaning that most representatives of 2^5a room (32 rooms), in each room can represent 2^5a machine (32 machines).
  • 12 bit: the maximum positive integer that is used to record the same milliseconds different id generated, 12 bit may represent 2^12 - 1 = 4096, i.e. this number can be represented by 12 bit to distinguish a single millisecond 4096 of different id .
0 | 0001100 10100010 10111110 10001001 01011100 00 | 10001 | 1 1001 | 0000 00000000
public class IdWorker {

    private long workerId;
    private long datacenterId;
    private long sequence;

    public IdWorker(long workerId, long datacenterId, long sequence) {
        // sanity check for workerId
        // 这儿不就检查了一下,要求就是你传递进来的机房id和机器id不能超过32,不能小于0
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(
                    String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(
                    String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        System.out.printf(
                "worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);

        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;
    }

    private long twepoch = 1288834974657L;

    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;

    // 这个是二进制运算,就是 5 bit最多只能有31个数字,也就是说机器id最多只能是32以内
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);

    // 这个是一个意思,就是 5 bit最多只能有31个数字,机房id最多只能是32以内
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private long sequenceBits = 12L;

    private long workerIdShift = sequenceBits;
    private long datacenterIdShift = sequenceBits + workerIdBits;
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private long sequenceMask = -1L ^ (-1L << sequenceBits);

    private long lastTimestamp = -1L;

    public long getWorkerId() {
        return workerId;
    }

    public long getDatacenterId() {
        return datacenterId;
    }

    public long getTimestamp() {
        return System.currentTimeMillis();
    }

    public synchronized long nextId() {
        // 这儿就是获取当前时间戳,单位是毫秒
        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format(
                    "Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        if (lastTimestamp == timestamp) {
            // 这个意思是说一个毫秒内最多只能有4096个数字
            // 无论你传递多少进来,这个位运算保证始终就是在4096这个范围内,避免你自己传递个sequence超过了4096这个范围
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }

        // 这儿记录一下最近一次生成id的时间戳,单位是毫秒
        lastTimestamp = timestamp;

        // 这儿就是将时间戳左移,放到 41 bit那儿;
        // 将机房 id左移放到 5 bit那儿;
        // 将机器id左移放到5 bit那儿;将序号放最后12 bit;
        // 最后拼接起来成一个 64 bit的二进制数字,转换成 10 进制就是个 long 型
        return ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift) | sequence;
    }

    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    private long timeGen() {
        return System.currentTimeMillis();
    }

    // ---------------测试---------------
    public static void main(String[] args) {
        IdWorker worker = new IdWorker(1, 1, 1);
        for (int i = 0; i < 30; i++) {
            System.out.println(worker.nextId());
        }
    }

}

How do you say, about the meaning of it, that 41 bit is a current timestamp in milliseconds, it is this sense; then 5 bit is that you passed in a room id (but only within maximum 32), the other 5 bit you passed in machine id (but only within maximum 32), the rest of the 12 bit serial number, that is, if you follow the previous generation are still within a millisecond id of time, then the order will give you accumulate, up to less than 4096 numbers.

So you take advantage of the tools that he engaged in a service, then for each machine in each room are initialized such a thing, the beginning of the serial number of the machine room is 0. Then each time it receives a request, saying that the machine room to generate a id, you will find the corresponding Worker generation.

Using this algorithm snowflake, you can develop your own company's services, even for a machine room and id id, anyway, you set aside 5 bit + 5 bit, you have a business meaning into other things are possible.

Of course, you can also use: one bit is unused + by where 41 is bit a milliseconds + 12 is bit as a sequence number + with 10 bit as the working machine id, or reversed exchange about the order, how to use your own business It needs to be equipped with the combination.

The snowflake algorithm is still relatively tricky, so you have to really do id distributed generation, what if it is high concurrency, then this should be a relatively good performance, usually tens of thousands of concurrent scenes per second, enough with you a.

Based on the number of public-meter pocket link:
https://mp.weixin.qq.com/s/mt8bVpM57SsI-nvTRKxSKg

 

Guess you like

Origin www.cnblogs.com/mq0036/p/11612530.html