endianness

看了好多次大端big-endian 小端little-endian总是忘记,看到这个写的很不错,纪录一下吧。

form:https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html


Big and Little Endian

Basic Memory Concepts

In order to understand the concept of big and little endian, youneed to understand memory. Fortunately, we only need a very highlevel abstraction for memory. You don't need to know all the littledetails of how memory works.

All you need to know about memory is that it's one large array.But one large array containing what? The array contains bytes.In computer organization, people don't use the term "index" torefer to the array locations. Instead, we use the term "address"."address" and "index" mean the same, so if you're getting confused,just think of "address" as "index".

Each address stores one element of the memory "array". Eachelement is typically one byte. There are some memory configurationswhere each address stores something besides a byte. For example, youmight store a nybble or a bit. However, those are exceedinglyrare, so for now, we make the broad assumption that all memory addressesstore bytes.

I will sometimes say that memory is byte-addresseable.This is just a fancy way of saying that each address stores onebyte. If I say memory is nybble-addressable, that meanseach memory address stores one nybble.

Storing Words in Memory

We've defined a word to mean 32 bits. This is the same as 4 bytes.Integers, single-precision floating point numbers, and MIPS instructionsare all 32 bits long. How can we store these values into memory?After all, each memory address can store a single byte, not 4 bytes.

The answer is simple. We split the 32 bit quantity into 4 bytes.For example, suppose we have a 32 bit quantity, written as90AB12CD16, which is hexadecimal. Since each hex digitis 4 bits, we need 8 hex digits to represent the 32 bit value.

So, the 4 bytes are: 90, AB, 12, CD where each byte requires2 hex digits.

It turns out there are two ways to store this in memory.

Big Endian

In big endian, you store the most significant byte in the smallestaddress. Here's how it would look:
Address Value
1000 90
1001 AB
1002 12
1003 CD

Little Endian

In little endian, you store the least significant byte inthe smallest address. Here's how it would look:
Address Value
1000 CD
1001 12
1002 AB
1003 90
Notice that this is in the reverse order compared to big endian.To remember which is which, recall whether the least significantbyte is stored first (thus, little endian) or the most significantbyte is stored first (thus, big endian).

Notice I used "byte" instead of "bit" in least significant bit.I sometimes abbreciated this as LSB and MSB, with the 'B' capitalizedto refer to byte and use the lowercase 'b' to represent bit. I onlyrefer to most and least significant byte when it comes to endianness.

Which Way Makes Sense?

Different ISAs use different endianness. While one way may seemmore natural to you (most people think big-endian is more natural),there is justification for either one.

For example, DEC and IBMs(?) are little endian, while Motorolasand Suns are big endian. MIPS processors allowed you to selecta configuration where it would be big or little endian.

Why is endianness so important? Suppose you are storing intvalues to a file, then you send the file to a machine which uses theopposite endianness and read in the value. You'll run into problemsbecause of endianness. You'll read in reversed values that won'tmake sense.

Endianness is also a big issue when sending numbers overthe network. Again, if you send a value from a machine of oneendianness to a machine of the opposite endianness, you'll haveproblems. This is even worse over the network, because you mightnot be able to determine the endianness of the machine that sent youthe data.

The solution is to send 4 byte quantities using network byte orderwhich is arbitrarily picked to be one of the endianness (not sure if it'sbig or little, but it's one of them). If your machine has the sameendianness as network byte order, then great, no change is needed.If not, then you must reverse the bytes.

History of Endian-ness

Where does this term "endian" come from? Jonathan Swift was a satirist(he poked fun at society through his writings). His most famous bookis "Gulliver's Travels", and he talks about how certain people preferto eat their hard boiled eggs from the little end first (thus, littleendian), while others prefer to eat from the big end (thus, big endians)and how this lead to various wars.

Of course, the point was to say that it was a silly thing to debateover, and yet, people argue over such trivialities all the time (forexample, should braces line in parallel or not? vi or emacs? UNIX orWindows).

Misconceptions

Endianness only makes sense when you want to break a largevalue (such as a word) into several small ones. You must decideon an order to place it in memory.

However, if you have a 32 bit register storing a 32 bit value,it makes no sense to talk about endianness. The register isneither big endian nor little endian. It's justa register holding a 32 bit value. The rightmost bit is theleast significant bit, and the leftmost bit is the most significantbit.

There's no reason to rearrange the bytes in a register in someother way.

Endianness only makes sense when you are breaking up a multi-bytequantity, and attempting to store the bytes at consecutive memorylocations. In a register, it doesn't make sense. A registeris simply a 32 bit quantity, b31....b0,and endianness does not apply to it.

With regard to endianness, You may argue there's a very natural wayto store 4 bytes in 4 consecutive addresses, and that the other waylooks strange. In particular, it looks "backwards". However, what'snatural to you may not be natural to someone else. The fact of thematter is that the word is split in 4 bytes, and most people wouldagree that you need some order to place it in memory.

C-style strings

Once you start thinking about endianness, you begin to think itapplies to everything. Before you see big or little endian, youmay have had no idea it even existed. That's because it's reasonablywell-hidden from you.

If you do bitwise/bitshift operations on an int, you don't noticethe endianness. The machine arranges the multiple bytes so the leastsignificant byte is still the least significant byte (e.g.,b7-0) and the most significant byte is still themost significant byte (e.g., b31-24).

So, it's natural to think whether strings might be saved insome sort of strange order, depending on the machine.

This is where it's useful to think about all the facts youknow about arrays. A C-style string, after all, is still anarray of characters.

Here are some facts you should know about C-style strings and arrays.

  • C-style strings are stored in arrays of characters.
  • Each character requires one byte of memory, since characters are represented in ASCII (in the future, this could change, as Unicode becomes more popular).
  • In an array, the address of consecutive array elements increases. Thus, & arr[ i ] is less than & arr[ i + 1 ].
  • What's not as obvious is that if something is stored in increasing addresses in memory, it's going to be stored in increasing "addresses" in a file. When you write to a file, you usually specify an address in memory, and the number of bytes you wish to write to the file starting at that address.
So, let's imagine some C-style string in memory. You have the word"cat". Let's pretend 'c' is stored at address 1000. Then 'a' isstored at 1001. 't' is at 1002. The null character '\0' is at 1003.

Since C-style strings are arrays of characters, they follow therules of characters. Unlike int or long, you can easily see theindividual bytes of a C-style string, one byte at a time. You usearray indexing to access the bytes (i.e., characters) of a string.You can't easily index the bytes of an int or long, without playingsome pointer tricks (using reinterpret cast, for example, in C++).The individual bytes of an int are more or less hidden from you.

Now imagine writing out this string to a file using some sortof write() method. You specify a pointer to 'c', and the numberof bytes you wish to print (in this case 4). The write() methodproceeds byte by byte in the character string and writes it to the file,starting with 'c' and working to the null character.

Given that explanation, is it clear whether endianness matterswith C-style strings? Hopefully, it is clear.

As an aside, since C++ strings are objects, it may havecomplicated inner structures, and so it's less obvious what a C++string would look like when print out to a file. It's well-knownwhat a C-style string looks like (a sequence of characters endingin a null character), which is why I've been careful to callthem C-style strings.

猜你喜欢

转载自blog.csdn.net/u011627161/article/details/70185406