What about unsigned types in Java?
https://www.cnblogs.com/yuanyq/p/java_unsigned_types.html
Byte) 127 +1 == (Byte) -128; //true
In languages such as C and C ++, provides integer types of different lengths: char
, short
, int
, long
(in fact, char
not a real integer, but you can use it as an integer to use in practical application scenarios, a lot of people. C language used char
to store a small integer). On most 32-bit operating systems, these types correspond to 1 byte, 2 bytes, 4 bytes, and 8 bytes, respectively. But it should be noted that the byte lengths corresponding to these integer types are different on different platforms. Relatively speaking, because Java is designed for cross-platform, no matter what platform it is running on, Java byte
will always be 1 byte, short
2 bytes, int
4 bytes, and long
8 bytes.
The integer types in the C language all provide the corresponding "unsigned" version, but Java does not have this feature. I think that Java does not support unsigned types is really uncomfortable, you think about it, a large number of hardware interfaces, network protocols and file formats will use unsigned types! (The char
types provided char
in Java are different from those in C. In Java, chat
2 bytes are used to represent Unicode values, and in C, char
1 byte is used to represent ASCII values. Although you can put char
Used as an unsigned short integer to represent integers from 0 to 2^16. But using this way may produce all kinds of weird things, for example, when you want to print this value, what is actually printed is the corresponding value Character instead of the string representation of the value itself).
So, how to deal with the lack of unsigned types in Java?
Well, you might not like the scheme I gave...
The answer is: use a signed type that is larger than the unsigned type you want to use.
For example: use short
to process unsigned bytes, use long
to process unsigned integers, etc. (you can even use char
to process unsigned short integers). Indeed, this seems wasteful because you use twice the storage space, but there is no better way. In addition, it needs to be reminded that long
access to type variables is not an atomic operation, so if you are in a multi-threaded scenario, you have to deal with the synchronization problem yourself.
How to store and read data in unsigned form?
If someone sends you a bunch of bytes containing unsigned values (or bytes read from a file) from the Internet, then you need to do some extra processing to convert them to larger values in Java Types of.
There is another issue of endianness. But now we don't care about it, just treat it as "network byte order", that is, "high end first", which is also the standard byte order in Java.
Read from network byte order
Suppose we start processing a byte array, and we want to read from it an unsigned byte, an unsigned short integer, and an unsigned integer.
short anUnsignedByte = 0;
char anUnsignedShort = 0;
long anUnsignedInt = 0;
int firstByte = 0;
int secondByte = 0;
int thirdByte = 0;
int fourthByte = 0;
byte buf[] = getMeSomeData();
// Check to make sure we have enough bytes
if(buf.length < (1 + 2 + 4)) doSomeErrorHandling();
int index = 0;
firstByte = (0x000000FF & ((int)buf[index]));
index++;
anUnsignedByte = (short)firstByte;
firstByte = (0x000000FF & ((int)buf[index]));
secondByte = (0x000000FF & ((int)buf[index+1]));
index = index+2;
anUnsignedShort = (char) (firstByte << 8 | secondByte);
firstByte = (0x000000FF & ((int)buf[index]));
secondByte = (0x000000FF & ((int)buf[index+1]));
thirdByte = (0x000000FF & ((int)buf[index+2]));
fourthByte = (0x000000FF & ((int)buf[index+3]));
index = index+4;
anUnsignedInt = ((long) (firstByte << 24
| secondByte << 16
| thirdByte << 8
| fourthByte))
& 0xFFFFFFFFL;
Okay, it looks a little complicated now. But it's actually very intuitive. First of all, you see a lot of things like this:
0x000000FF & (int)buf[index]
First, the signed one is byte
promoted to the int
type, and then the int
bitwise AND operation is performed on this , leaving only the last 8 bits. Because Java byte
is signed, when an byte
unsigned value is greater than 127, the binary bit that represents the sign will be set to 1 (strictly speaking, this cannot be regarded as a sign bit, because in the computer, the number is based on the complement Code). For Java, this is a negative number. When the corresponding negative value is byte
promoted to the int
type, bits 0 to 7 will be reserved, and bits 8 to 31 will be set to 1. And then 0x000000FF
perform a bitwise AND operation with it to erase 8 to 31 bit ones. The above code can be written briefly:
0xFF & (int)buf[index]
0xFF
Leading zeros are automatically filled in by Java, and in Java, bitwise operators &
cause byte
automatic promotion to int
.
What you see next is a lot of bitwise left shift operators <<
. This operator shifts the left operand to the left by the bits specified by the right operand. So if you have one int foo = 0x000000FF
, you foo << 8
will get it 0x0000FF00
, foo << 16
you will get it 0x00FF0000
.
The last is the bitwise OR operator |
. Suppose you now load 2 bytes of an unsigned short integer into the corresponding integer, you will get two integers 0x00000012
and 0x00000034
sum. Now you shift the first byte 8 bits to the left to get the 0x00001200
sum 0x00000034
, and then you need to put them back together again. Therefore, a bitwise OR operation is required. 0x00001200 | 0x00000034
Will get 0x00001234
the char
type so that it can be stored in Java .
These are basic operations. But for unsigned int
, you need to store it in the long
type. The other operations are similar to the previous ones, except that you need to int
upgrade to long
then and 0xFFFFFFFFL
perform bitwise AND operations. The last one is L
used to tell Java to treat this constant as a long
process.
Write endianness to the network
Suppose now we want to write the value we read in the above step to the buffer. We read in the order of unsigned byte
, unsigned short
and unsigned int
. Now, regardless of the reason, we plan to write in the order of unsigned int
, unsigned short
and unsigned byte
.
buf[0] = (anUnsignedInt & 0xFF000000L) >> 24;
buf[1] = (anUnsignedInt & 0x00FF0000L) >> 16;
buf[2] = (anUnsignedInt & 0x0000FF00L) >> 8;
buf[3] = (anUnsignedInt & 0x000000FFL);
buf[4] = (anUnsignedShort & 0xFF00) >> 8;
buf[5] = (anUnsignedShort & 0x00FF);
buf[6] = (anUnsignedByte & 0xFF);
What is the endianness?
What does it mean? Do I need to pay attention? And, what is the network byte order?
The "high endian" byte order used in Java is also called "network byte order". Intel x86 processors are in "little endian" endianness (unless you are running Java programs on them). Data files created by x86 systems are usually (but not required) low-order first, while data files created by Java programs are usually (but not required) high-order first. Any system can output data according to the byte order it needs.
What does endianness mean?
"Endianness" refers to the order in which the computer stores the values in the memory. The common ones are nothing more than two modes: high bit first and low bit first. Of course you need to pay attention to the endianness problem, otherwise, if you read a data file stored in low-endian order according to the high-endian order, you may only get messy data, and vice versa. .
Any numerical value, no matter how it is expressed, such as 5000,000,007
or its hexadecimal format 0x1DCD6507
, can be regarded as a string of numbers. For a string of numbers, we can think of it as having a beginning (far left) and an end (far right). In English, the first number is the highest digit, for example 5000,000,007
, in 5
actually means 500,000,000
. The last digit is the lowest digit, for example 500,000,007
, the 7
corresponding value in is 7
.
When we talk about endianness, we are referring to the order in which we write numbers. We always start writing from the high order, and then the second high order, until the lowest order. Isn’t that true?
In the above example, the value 500,000,007
corresponding hexadecimal way 0x1DCD6507
, we put it into four separate bytes: 0x1D
, 0xDC
, 0x65
and 0x07
, the corresponding decimal values of 29, 205, 101, and 7. The most significant byte 29 means 29 *256 * 256 * 256 = 486539264
, next is 205, which means 205 * 256 * 256 = 13434880
, and then 101, which means 101 * 256 = 25856
that the last 7 is 7 * 1 = 7
. Their values:
486539264 + 13434880 + 25856 + 7 = 500,000,007
When the computer stores these 4 bytes in its memory, it is assumed that the addresses stored in the memory are 2056, 2057, 2058 and 2059. So the question is: which byte is stored at which memory address? It may store 29 in address 2056, 205 in 2057, 101 in 2058, and 7 in 2059, just like the order in which you write down this number, which we call high-end first. However, other computer architectures may store 7 in 2056, 101 in 2057, 205 in 2058, and 29 in 2059. This order is called low-order first.
The same is true for 2-byte and 8-byte storage methods. The most significant byte is called MSB, and the least significant byte is called LSB.
Okay, so why should I care about endianness?
This depends on the situation. Normally you don't need to care about this issue. No matter what platform you run a Java program on, its endianness is the same, so you don't need to care about endianness.
But what about when you want to process data generated in other languages? Then, the endianness is a big problem. You must ensure that you decode the data in the order in which the data is encoded, and vice versa. If you are lucky enough, you usually find instructions on endianness in API or protocol specifications, file format descriptions. If it doesn't happen... Good luck!
The most important thing is that you need to have a clear understanding of the endianness of the data you are using and the endianness of the data you need to process. If the two are different, you need to perform additional processing to ensure correctness. Also, if you need to deal with unsigned values, you need to make sure that the correct byte is placed integer/short/long
in the correct position of the corresponding type.
What is the network byte order?
When designing the IP protocol, the high-endian byte order is designed as the network byte order. In the IP message, the German value types are stored in network byte order. The endianness used by the computer that generates the message is called the "host endianness", which may or may not be the same as the network endianness. Like network byte order, byte order in Java is high-endian.
Why is there no unsigned type?
Why doesn't Java provide unsigned types? good question! I often find this thing very strange, especially since many network protocols used unsigned types at the time. In 1999, I also searched on the Web for a long time (Google was not so good at that time), because I always felt that this should not be the case. Until one day I interviewed one of the inventors of Java (Is it Gosling? I don't remember it very much, it would be nice if I saved the web page at that time), the designer said a paragraph to the effect: "Hey! Symbolic types complicate things. No one really needs unsigned types, so we drove it out."
Here is a page that records an interview with James Gosling to see if you can get some inspiration: