mmap vs regular IO

https://www.quora.com/How-is-a-mmaped-file-I-O-different-from-a-regular-file-I-O-with-regard-to-the-kernel

In Linux, there’s something called a page cache (the one you say which is an in-memory radix tree) which is in between your process and the disk while you’re accessing a file. All I/O operation will have to go through the page cache.

So if you said open the file “me.txt” and fetch the first 64 bytes via a read() call, what Linux will do is first see if the first 4K bytes of the file “me.txt” is present in the file cache. If not present, then it reads the 4K bytes into the page cache and finally copies the 64 bytes onto the buffer specified by the read() system call.

Now let’s look at how the usual mmap() works. Every process sees its own address space, with mmap() you can choose to map a file (or a portion) onto your address space. As an example you’d ask the operating system to map the first 64K bytes of the the file “me.txt” onto address 0x1000. What this will do is map your process’s addresses from 0x01000 to 0x109FF (64K is 0xFA00) to the first 64K bytes of the file. So if you have a pointer char *p = 0x01000; if you want the 10th byte in the file you’d just access it via p[9].

Unless you specify MAP_POPULATE or any related flag while invoking the mmap() routine, the system call just goes and meddles with some kernel data structures internally so that these addresses map to some non existent page frames[1] in the memory. No data is fetched when you do a mmap(). Let’s assume that each page frame is 4K in size. The first time you access an address this results in a page fault in the Linux VMM and then that page is fetched into the memory and kept in the same page cache discussed above.

Now we’ve armed ourselves with the concept of read() and mmap() and how they have been implemented internally. Let’s get down to how the perform in various scenarios.

To answer your question, I personally believe that both were designed to serve different use case scenarios as described below:

Let’s us first see the scenarios where mmap() does exceptionally well:

First off you should have noticed that if you’re using mmap() then you avoid an extra copy. In our first example let’s say you wanted to print the 64 bytes what you fetched from the read() call, 4K bytes were first fetched from disk onto the page cache and finally 64 bytes were copied onto some address that you specified. Then you print from here. But with mmap() the moment you try to access p[0] a page fault occurs and then you fetch the first 4K bytes into memory and you directly access the memory in the page cache as an input to your printf() routine.
mmap() also does exceptionally well when you have a huge file and you randomly access very small portions of the file and you largely keep re-visiting the neighborhood regions to wherever you previously accessed. This is exceptionally fast because you’d have hit very less page faults if you keep accessing the same areas in memory. In contrast to read() you’ll avoid the extra copy every time you keep accessing a file, and thus a speed up.

Let’s see some scenarios where read() does exceptionally well:

If multiple readers and writers (from various processes) keep acting on a single file. Then it is much more easier for you to let the operating system abstract the atomicity (guaranteeing that an operation completes without race condition). It is actually a pain in the neck to maintain synchronization between processes when they’re reading and writing on the same portion of the file.
read() calls usually don’t have the overhead of page faulting. Page faulting is seriously expensive, so if you’re page faulting frequently you’re going to slow things around you. And as seen above the main logic behind mmap() is to page fault on the address if the contents of the file are not in memory. If you’re doing a sequential read/write then it is better to perform I/O via your usual read() and write() calls as the copy overhead is negligible when compared to the overhead incurred because of frequent page faults. And usually all our file accesses fall into this category.
mmap() has serious amounts of setup/tear down overhead (it meddles with a lot of things with VMM). Unless you’re in for the long run then mmap() doesn’t sit well with your use case. Only applications such as databases which open a file and then keep them open for a long long time prefer to use mmap().

I hope with this you’ve got a chance to peek into the vast internals of the Linux VMM and understood a bit about the tradeoffs involved between regular and memory mapped I/O.

On a historical note, mmap() originally used to work with the page cache and read() and write() calls interfaced with something called a buffer cache. Things were a mess! There were a lot of synchronization issues between the two caches and a single file could reside in two caches at the same time. Finally it was decided to have a single cache that was generic to memory pages, and thus the page cache we have today was implemented.

[1] page frame: A unit of memory that the Linux VMM deals with. More info can be found on this SO article: is number of frame = number of pages(linux)?

PS: Feel free to suggest edits about grammatical and conceptual errors that may have creeped into this very long answer of mine.

Hat tip: Sebas Sujeen for the A2A. PM me, I’ll return the credits if you’re running low.

猜你喜欢