What about unsigned types in Java?

What about unsigned types in Java?

https://www.cnblogs.com/yuanyq/p/java_unsigned_types.html

Byte) 127 +1 == (Byte) -128; //true

In languages such as C and C ++, provides integer types of different lengths: charshortintlong (in fact, char not a real integer, but you can use it as an integer to use in practical application scenarios, a lot of people. C language used  char to store a small integer). On most 32-bit operating systems, these types correspond to 1 byte, 2 bytes, 4 bytes, and 8 bytes, respectively. But it should be noted that the byte lengths corresponding to these integer types are different on different platforms. Relatively speaking, because Java is designed for cross-platform, no matter what platform it is running on, Java  byte will always be 1 byte, short 2 bytes, int 4 bytes, and long 8 bytes.

The integer types in the C language all provide the corresponding "unsigned" version, but Java does not have this feature. I think that Java does not support unsigned types is really uncomfortable, you think about it, a large number of hardware interfaces, network protocols and file formats will use unsigned types! (The char types provided  char in Java are different from those in C.  In Java, chat 2 bytes are used to represent Unicode values, and in C, char 1 byte is used to represent ASCII values. Although you can put  char Used as an unsigned short integer to represent integers from 0 to 2^16. But using this way may produce all kinds of weird things, for example, when you want to print this value, what is actually printed is the corresponding value Character instead of the string representation of the value itself).

So, how to deal with the lack of unsigned types in Java?

Well, you might not like the scheme I gave...

The answer is: use a signed type that is larger than the unsigned type you want to use.

For example: use  short to process unsigned bytes, use  long to process unsigned integers, etc. (you can even use  char to process unsigned short integers). Indeed, this seems wasteful because you use twice the storage space, but there is no better way. In addition, it needs to be reminded that long access to  type variables is not an atomic operation, so if you are in a multi-threaded scenario, you have to deal with the synchronization problem yourself.

How to store and read data in unsigned form?

If someone sends you a bunch of bytes containing unsigned values ​​(or bytes read from a file) from the Internet, then you need to do some extra processing to convert them to larger values ​​in Java Types of.

There is another issue of endianness. But now we don't care about it, just treat it as "network byte order", that is, "high end first", which is also the standard byte order in Java.

Read from network byte order

Suppose we start processing a byte array, and we want to read from it an unsigned byte, an unsigned short integer, and an unsigned integer.

short anUnsignedByte = 0; 
char anUnsignedShort = 0; 
long anUnsignedInt = 0; 
int firstByte = 0; 
int secondByte = 0; 
int thirdByte = 0; 
int fourthByte = 0; 
byte buf[] = getMeSomeData(); 
// Check to make sure we have enough bytes 
if(buf.length < (1 + 2 + 4)) doSomeErrorHandling(); 
int index = 0; 
firstByte = (0x000000FF & ((int)buf[index])); 
index++; 
anUnsignedByte = (short)firstByte; 

firstByte = (0x000000FF & ((int)buf[index])); 
secondByte = (0x000000FF & ((int)buf[index+1])); 
index = index+2; 
anUnsignedShort = (char) (firstByte << 8 | secondByte); 

firstByte = (0x000000FF & ((int)buf[index])); 
secondByte = (0x000000FF & ((int)buf[index+1])); 
thirdByte = (0x000000FF & ((int)buf[index+2])); 
fourthByte = (0x000000FF & ((int)buf[index+3])); 
index = index+4; 
anUnsignedInt = ((long) (firstByte << 24 
                        | secondByte << 16 
                        | thirdByte << 8 
                        | fourthByte)) 
                        & 0xFFFFFFFFL;

Okay, it looks a little complicated now. But it's actually very intuitive. First of all, you see a lot of things like this:

0x000000FF & (int)buf[index]

First, the signed one is  byte promoted to the  int type, and then the int bitwise AND operation is performed on this  , leaving only the last 8 bits. Because Java  byte is signed, when an  byte unsigned value is greater than 127, the binary bit that represents the sign will be set to 1 (strictly speaking, this cannot be regarded as a sign bit, because in the computer, the number is based on the complement Code). For Java, this is a negative number. When the corresponding negative value is  byte promoted to the  int type, bits 0 to 7 will be reserved, and bits 8 to 31 will be set to 1. And then  0x000000FF perform a bitwise AND operation with it to erase 8 to 31 bit ones. The above code can be written briefly:

0xFF & (int)buf[index]

0xFF Leading zeros are automatically filled  in by Java, and in Java, bitwise operators  & cause  byte automatic promotion to  int.

What you see next is a lot of bitwise left shift operators  <<. This operator shifts the left operand to the left by the bits specified by the right operand. So if you have one  int foo = 0x000000FF, you  foo << 8 will get it  0x0000FF00, foo << 16 you will get it  0x00FF0000.

The last is the bitwise OR operator  |. Suppose you now load 2 bytes of an unsigned short integer into the corresponding integer, you will get   two integers 0x00000012 and  0x00000034sum. Now you shift the first byte 8 bits to the left to get the  0x00001200 sum  0x00000034, and then you need to put them back together again. Therefore, a bitwise OR operation is required. 0x00001200 | 0x00000034 Will get  0x00001234the char type so that it can be stored in Java  .

These are basic operations. But for unsigned  int, you need to store it in the  long type. The other operations are similar to the previous ones, except that you need to  int upgrade to  long then and  0xFFFFFFFFL perform bitwise AND operations. The last one is  L used to tell Java to treat this constant as a  long process.

Write endianness to the network

Suppose now we want to write the value we read in the above step to the buffer. We  read in the order of unsigned  byte, unsigned  short and unsigned  int. Now, regardless of the reason, we plan  to write in the order of unsigned  int, unsigned  short and unsigned  byte.

buf[0] = (anUnsignedInt & 0xFF000000L) >> 24; 
buf[1] = (anUnsignedInt & 0x00FF0000L) >> 16; 
buf[2] = (anUnsignedInt & 0x0000FF00L) >> 8; 
buf[3] = (anUnsignedInt & 0x000000FFL); 

buf[4] = (anUnsignedShort & 0xFF00) >> 8; 
buf[5] = (anUnsignedShort & 0x00FF); 

buf[6] = (anUnsignedByte & 0xFF);

What is the endianness?

What does it mean? Do I need to pay attention? And, what is the network byte order?

The "high endian" byte order used in Java is also called "network byte order". Intel x86 processors are in "little endian" endianness (unless you are running Java programs on them). Data files created by x86 systems are usually (but not required) low-order first, while data files created by Java programs are usually (but not required) high-order first. Any system can output data according to the byte order it needs.

What does endianness mean?

"Endianness" refers to the order in which the computer stores the values ​​in the memory. The common ones are nothing more than two modes: high bit first and low bit first. Of course you need to pay attention to the endianness problem, otherwise, if you read a data file stored in low-endian order according to the high-endian order, you may only get messy data, and vice versa. .

Any numerical value, no matter how it is expressed, such as  5000,000,007 or its hexadecimal format  0x1DCD6507, can be regarded as a string of numbers. For a string of numbers, we can think of it as having a beginning (far left) and an end (far right). In English, the first number is the highest digit, for example  5000,000,007 , in  5 actually means  500,000,000. The last digit is the lowest digit, for example  500,000,007 , the  7 corresponding value in is  7.

When we talk about endianness, we are referring to the order in which we write numbers. We always start writing from the high order, and then the second high order, until the lowest order. Isn’t that true?

In the above example, the value  500,000,007corresponding hexadecimal way  0x1DCD6507, we put it into four separate bytes: 0x1D0xDC0x65 and  0x07, the corresponding decimal values of 29, 205, 101, and 7. The most significant byte 29 means  29 *256 * 256 * 256 = 486539264, next is 205, which means  205 * 256 * 256 = 13434880, and then 101, which means  101 * 256 = 25856that the last 7 is  7 * 1 = 7. Their values:

486539264 + 13434880 + 25856 + 7 = 500,000,007

When the computer stores these 4 bytes in its memory, it is assumed that the addresses stored in the memory are 2056, 2057, 2058 and 2059. So the question is: which byte is stored at which memory address? It may store 29 in address 2056, 205 in 2057, 101 in 2058, and 7 in 2059, just like the order in which you write down this number, which we call high-end first. However, other computer architectures may store 7 in 2056, 101 in 2057, 205 in 2058, and 29 in 2059. This order is called low-order first.

The same is true for 2-byte and 8-byte storage methods. The most significant byte is called MSB, and the least significant byte is called LSB.

Okay, so why should I care about endianness?

This depends on the situation. Normally you don't need to care about this issue. No matter what platform you run a Java program on, its endianness is the same, so you don't need to care about endianness.

But what about when you want to process data generated in other languages? Then, the endianness is a big problem. You must ensure that you decode the data in the order in which the data is encoded, and vice versa. If you are lucky enough, you usually find instructions on endianness in API or protocol specifications, file format descriptions. If it doesn't happen... Good luck!

The most important thing is that you need to have a clear understanding of the endianness of the data you are using and the endianness of the data you need to process. If the two are different, you need to perform additional processing to ensure correctness. Also, if you need to deal with unsigned values, you need to make sure that the correct byte is placed integer/short/long in the correct position of the corresponding  type.

What is the network byte order?

When designing the IP protocol, the high-endian byte order is designed as the network byte order. In the IP message, the German value types are stored in network byte order. The endianness used by the computer that generates the message is called the "host endianness", which may or may not be the same as the network endianness. Like network byte order, byte order in Java is high-endian.

Why is there no unsigned type?

Why doesn't Java provide unsigned types? good question! I often find this thing very strange, especially since many network protocols used unsigned types at the time. In 1999, I also searched on the Web for a long time (Google was not so good at that time), because I always felt that this should not be the case. Until one day I interviewed one of the inventors of Java (Is it Gosling? I don't remember it very much, it would be nice if I saved the web page at that time), the designer said a paragraph to the effect: "Hey! Symbolic types complicate things. No one really needs unsigned types, so we drove it out."

Here is a page that records an interview with James Gosling to see if you can get some inspiration:

Guess you like

Origin blog.csdn.net/u010689853/article/details/110959465