The method described and the generated UUID

What is the UUID?



UUID is an acronym for Universally Unique Identifier, which is within a certain range (from the particular to the global namespace) only machine-generated identifiers. UUID have the following meanings:
 

  • Generated via a certain algorithm machine

To ensure uniqueness, including specification defines the MAC address, time stamp, the name space (the Namespace), random or pseudo-random number sequence and other elements, and these elements are generated from algorithms UUID UUID. The complex nature of the UUID to ensure its uniqueness at the same time, meaning that can only be generated by a computer.

  • Specify non-human, non-manual identification

UUID is not manually specify, unless you are running a risk of duplication UUID. UUID of the complexity of the "average person" does not know which object associated with it directly from a UUID.

  • Repeated in a specific range unlikely

The main purpose of generating algorithm specification defines the UUID is to ensure uniqueness. But this uniqueness is limited, can only be guaranteed within a certain range, and the UUID about this type (see versions of a UUID).

UUID 128 is 16 bytes long number, usually expressed as a string of 36 bytes, the following example:

3F2504E0-4F89-11D3-9A0C-0305E82C3301

where the letter is hexadecimal, case insensitive.
 

Universally Unique IDentifier (UUID), has eight children of RFC specification , is a digital 128bit can also be expressed as 32 hexadecimal characters, in the middle "-" segmentation.

- timestamp + UUID version number, divided into three accounted for 16 characters (60bit + 4bit),
- Clock Sequence numbers and reserved field, accounting for four characters (13bit + 3bit),
- node identification accounted for 12 characters (48bit),

GUID (Globally Unique Identifier) ​​is the UUID aliases; but in practical application, GUID usually refers to Microsoft's implementation of the UUID.



UUID version



UUID having a plurality of versions, each version of different algorithms, the application range is different.

The first is a special case --Nil UUID-- Usually we do not use it, it is all numbers 0 composition, as follows:

00000000-0000-0000-0000-000000000000



UUID Version 1: time-based UUID
 

Because the timestamp has full 60bit, so you can enjoy the flowers, with 100 ns to 1, from October 15, 1582 date (can sustain in 3655, the number of bits really give burn, 1582 What is interesting)

48bit node identifier also, generally expressed by a MAC address, if multiple network cards with a casually. If there is no card, just make up the numbers with random numbers, or take a bunch of other information as much as possible, such as host name or something, and then pieces together a hash.

This sequence number is only for avoiding 16bit previous node marked change (e.g., change the card), the clock system problems (such as restarting after a slow clock speed), it at random to avoid duplication.

But it seems Version 1 even considered such issues played two processes on one machine, did not consider concurrency issues the same time stamp, so strict Version1 nobody realized, then look down the various variants of it.

 

Version1 variant - Hibernate

Hibernate's CustomVersionOneStrategy.java , solves two problems prior to version 1

- timestamp (6bytes, 48bit): millisecond level, counting from 1970, can sustain 8925 years ....
- sequence number (2bytes, 16bit, maximum 65535): A second later without a time stamp to zero thing , each His Own, short to spill over into the negative return 0.
- machine identification (4bytes 32bit): Take localHost IP address, IPV4 it just four byte, but if you want to IPV6 is 16 bytes, just take the first four byte.
- Process Identity (4bytes 32bit): with the current timestamp right 8 then rounded to deal with, do not believe two threads start at the same time.

It is worth noting is that the machine process and process ID consisting of 64bit Long virtually unchanged, changes only other Long enough.

 

Version1 variant - MongoDB

MongoDB's ObjectId.java

- timestamp (4 bytes 32bit): is the second level, the date 1970, can sustain 136 years.

- increment sequence (3bytes 24bit, maximum of sixteen million): Int is a start from a random number (wit) plus a continuous, there is no time stamp A second later to zero things Each His Own . Because only 3bytes, so a 4bytes of Int still cut it after 3bytes.

- machine identification (3bytes 24bit): The Mac address of the network card of all the pieces together to be a HashCode, also cut about the same as an int after 3bytes. NIC will not get mixed in the past with a random number.

- Process Identity (2bytes 16bits): JMX out from the inside back to the process ID, you can not get past mixed with hash or a random number process name.

Visible, MongoDB design of each field a little more reasonable than Hibernate, such as the time stamp is the second level. The total length is reduced to 12 bytes 96bit, but if the fruit with 64bit Long long to save a little flattening, can be expressed as a hexadecimal character string or byte array.

In addition to the Java version of the driver in the increment sequence there seems to have bug.

 

Twitter's number is sent snowflake

snowflake is also a camp number is based Thrift service, but not by redis simple increment, but similar to the UUID version1,

Long 64bit one length, so IdWorker a rigid allocation to:

- timestamp (42bit) Since 2012, the number (than those who would live from 1970 counting) of milliseconds, can sustain 139 years.
- increment sequence (12bit, the maximum value of 4096), the increment within milliseconds, over a one millisecond will be reset to 0.
- DataCenter ID (5 bit, the maximum value of 32), configuration values.
- Worker ID (5 bit, maximum 32), configuration values, because it is the school's id number, so a data center up to 32 numbers is enough to send, but also in the years to do the next ZK registered.

Visible, as is the number to send the machine and the process of identification and identity are out of the province, and it is possible only with a Long expression.

In addition, this number is sent, client can only an ID, the batch can not take, so the additional delay is a problem.

 

UUID Version 2: DCE security UUID


DCE (Distributed Computing Environment) UUID and UUID security algorithm based on the same time, but the time stamp will change the position of the front 4 POSIX UID or GID. This version of the UUID less frequently used in practice.

UUID Version 3: name-based UUID (MD5)

name-based UUID worth to name and name space by calculating the MD5 hash. This version of the UUID ensures: uniqueness of the same name space generated UUID of different names; the uniqueness of different namespace UUID of; the same namespace the same name repeatedly generating the same UUID.

UUID Version 4: random UUID

UUID generated according to a random number or pseudo-random number. The probability of this UUID generated duplicate can be calculated, but random things like a lottery: you expect it to get rich is impossible, but shit usually come in inadvertently.

UUID Version 5: name-based UUID (SHA1)

and version 3 UUID algorithm is similar, except that the hash value is calculated using the SHA1 (Secure Hash Algorithm 1) algorithm.

UUID application

can be seen from the UUID of different versions, Version 1/2 suitable for a distributed computing environment, a high degree of uniqueness; Version 3/5 suitable for unique names within a certain range, and may be repeated as needed or generated under UUID environment; as for 4, Version my personal recommendation is not the best (although it is the simplest and most convenient).

Usually we recommend the use UUID to identify the object or persistent data, with the following best not to use UUID:

  • Object type mappings. For example, only the code table and the code name.
  • Manual maintenance of non-system-generated objects. Such portion of the data base system.

For objects with names unrepeatable natural characteristics, it is preferable to use the UUID Version 3/5. Such as the system users. If the user's UUID is Version 1, if you accidentally deleted and then rebuilt user, you will find that person or that person, not the user has a user. (Although marked for deletion state is also a solution, but will bring the complexity of the implementation.)

Use UUID

# Generated based UUID host computer ID and the current time 
Print (uuid.uuid1 ())
 # MD5-based encryption namespace and a character UUID 
Print (uuid.uuid3 (uuid.NAMESPACE_DNS, ' alaji ' ))
 # randomly generate a UUID 
Print (uuid.uuid4 ())
 # based namespaces encryption and SHA-1 UUID of a character 
Print (uuid.uuid5 (uuid.NAMESPACE_DNS, ' alaji ' )) 

# needs to be converted into str type, will be described below

result

Da04-11e8-9133--98345eb4 54ab3a0caf8d 
171898c7 -77dd-3258-99f8- 41664402c08a 
31d7df8f -cb78-41aa-Bd4b- B73db34f9358 
A08d35bf -65b8-558f-Beea-369294bebbd8

 

UUID is a 128-bit globally unique identifiers, generally denoted by 32 byte string. It can guarantee the uniqueness of time and space, also known as GUID, full name: UUID - Universally Unique IDentifier, Python called the UUID.
It MAC address, time stamp, a namespace, a random number, pseudo-random numbers to guarantee the uniqueness of the generated ID.
There are five main UUID algorithm, which is five ways to achieve.

  • uuid1 () - The time stamp. , The current time stamp, a random number generated by the MAC address. Can guarantee the uniqueness of the globe, but the MAC used simultaneously introduce a security issue, you can use the LAN IP instead of MAC.
  • uuid2 () - based distributed computing environment DCE (Python not function). Same algorithm and uuid1, except that the former 4 is changed to the position of the time stamp of the UID POSIX. This method is rarely used in practice.
  • uuid3 () - based on the name of the MD5 hash value. By calculating the MD5 hash name and namespace worth to ensure the uniqueness of the same name space is unique, and different namespaces of different names, but the same name in the same namespace generate the same uuid.
  • uuid4 () - based on a random number. Obtained from the pseudo-random number, there is a certain probability of repetition, the probabilities can be calculated.
  • uuid5 () - based on the name of the SHA-1 hash value. Uuid3 same algorithm, except that Secure Hash Algorithm 1 Algorithm.


 


 

 

Guess you like

Origin www.cnblogs.com/HUIWANG/p/11133765.html