New features of Linux memory management - Interpretation of Memory folios

1. What is folio [ˈfoʊlioʊ]

1.1 Definition of folio

Add memory folios, a new type to represent either order-0 pages or the head page of a compound page.

Folio can be regarded as a layer of packaging for page, the kind with no overhead. A folio can be a single page or a composite page.

(The picture quotes the ultimate optimization around HugeTLB)

The above figure is a schematic diagram of the page structure, 64 bytes manage flags, lru, mapping, index, private, {ref_, map_}count, memcg_data and other information. When the page is a compound page, the above flags and other information are in the head page, and the tail page reuses and manages compound_{head, mapcount, order, nr, dtor} and other information.

struct folio {
        /* private: don't document the anon union */
        union {
                struct {
        /* public: */
                        unsigned long flags;
                        struct list_head lru;
                        struct address_space *mapping;
                        pgoff_t index;
                        void *private;
                        atomic_t _mapcount;
                        atomic_t _refcount;
#ifdef CONFIG_MEMCG
                        unsigned long memcg_data;
#endif
        /* private: the union with struct page is transitional */
                };
                struct page page;
        };
};

In the structure definition of folio, flags, lru and other information are completely consistent with the page, so they can be unioned with the page. This allows you to use folio->flags directly instead of folio->page->flags.

#define page_folio(p)           (_Generic((p),                          \
        const struct page *:    (const struct folio *)_compound_head(p), \
        struct page *:          (struct folio *)_compound_head(p)))
#define nth_page(page,n) ((page) + (n))
#define folio_page(folio, n)    nth_page(&(folio)->page, n)

Page_folio may be a bit confusing at first glance, but it is actually equivalent to:

switch (typeof(p)) {
  case const struct page *:
    return (const struct folio *)_compound_head(p);
  case struct page *:
    return (struct folio *)_compound_head(p)));
}

It's that simple.

_Generic is a C11 STANDARD - 6.5.1.1 Generic selection ( https://www.open-std.org/JTC1/sc22/wg14/www/docs/n1570.pdf ) feature, the syntax is as follows:

Generic selection
Syntax
 generic-selection:
  _Generic ( assignment-expression , generic-assoc-list )
 generic-assoc-list:
  generic-association
  generic-assoc-list , generic-association
 generic-association:
  type-name : assignment-expression
  default : assignment-expression

The conversion between page and folio is also very straightforward. Regardless of head or tail page, when converted to folio, the meaning is equivalent to obtaining the folio corresponding to the head page; when folio is converted to page, folio->page is used to obtain the head page, and folio_page(folio, n) can be used to obtain the tail page.

The question is, the page can represent the base page or the compound page, why do we need to introduce folio?

1.2 What can folio do?

The folio type allows a function to declare that it's expecting only a head page. Almost incidentally, this allows us to remove various calls to VM_BUG_ON(PageTail(page)) and compound_head().

The reason is that page has too many meanings, it can be base page, compound head page, or compound tail page.

As mentioned above, page meta-information is stored on the head page (the base page can be regarded as the head page), such as page->mapping, page->index, etc. But on the mm path, the page parameter passed in always needs to be judged whether it is a head page or a tail page. Since there is no context cache, there may be too many duplicate compound_head calls on the mm path.

Here we take the mem_cgroup_move_account function call as an example. One mem_cgroup_move_account call can execute compound_head up to 7 times.

static inline struct page *compound_head(struct page *page)
{
        unsigned long head = READ_ONCE(page->compound_head);
        if (unlikely(head & 1))
                return (struct page *) (head - 1);
        return page;
}

Let's take page_mapping(page) as an example for detailed analysis. After entering the function, first execute compound_head(page) to obtain page mapping and other information. There is also a branch PageSwapCache(page). When executing this branch function, the page is passed, and compound_head(page) needs to be executed once inside the function to obtain the page flag information.

struct address_space *page_mapping(struct page *page)
{
        struct address_space *mapping;
        page = compound_head(page);
        /* This happens if someone calls flush_dcache_page on slab page */
        if (unlikely(PageSlab(page)))
                return NULL;
        if (unlikely(PageSwapCache(page))) {
                swp_entry_t entry;
                entry.val = page_private(page);
                return swap_address_space(entry);
        }
        mapping = page->mapping;
        if ((unsigned long)mapping & PAGE_MAPPING_ANON)
                return NULL;
        return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
}
EXPORT_SYMBOL(page_mapping);

When switching to folio, page_mapping(page) corresponds to folio_mapping(folio), and folio implies that folio itself is the head page, so the two compound_head(page) calls are omitted. mem_cgroup_move_account is just the tip of the iceberg, the mm path is full of calls to compound_head. The cumulative effect is that not only the execution overhead is reduced, but the developer can also be reminded that the current folio must be the head page, reducing the number of judgment branches.

1.3 Immediate value of folio

1) Reduce too many redundant compound_head calls.

2) Prompt developers to recognize that this is the head page when they see folio.

3) Fix potential bugs caused by tail page.

Here's an example where our current confusion between "any page"
and "head page" at least produces confusing behaviour, if not an
outright bug, isolate_migratepages_block():
        page = pfn_to_page(low_pfn);
        if (PageCompound(page) && !cc->alloc_contig) {
                const unsigned int order = compound_order(page);
                if (likely(order < MAX_ORDER))
                        low_pfn += (1UL << order) - 1;
                goto isolate_fail;
        }
compound_order() does not expect a tail page; it returns 0 unless it's
a head page.  I think what we actually want to do here is:
        if (!cc->alloc_contig) {
            struct page *head = compound_head(page);
            if (PageHead(head)) {
                const unsigned int order = compound_order(head);
                low_pfn |= (1UL << order) - 1;
                goto isolate_fail;
            }
        }
Not earth-shattering; not even necessarily a bug.  But it's an example
of the way the code reads is different from how the code is executed,
and that's potentially dangerous.  Having a different type for tail
and not-tail pages prevents the muddy thinking that can lead to
tail pages being passed to compound_order().

1.4 folio-5.16 has been merged

This converts just parts of the core MM and the page cache.

willy/pagecache.git has a total of  209  commits. In this 5.16 merge window, the author Matthew Wilcox (Oracle) <[email protected]> first merged the basic part of folio, namely Merge tag folio-5.16, which contains  90  commits, 74  changed files with  2914  additions and  1703  deletions. In addition to folio definition and other infrastructure, this change mainly focuses on the memcg, filemap, and writeback parts. folio-5.16 The process of gradually replacing page with folio seems worth mentioning. There are too many mm paths. If you have obsessive-compulsive disorder and replace them all at once, you have to use top-down method, change from page allocation to folio, and then change all the way. This is unrealistic, almost the entire mm folder has to be modified. Folio-5.16 adopts the bottom-up method. At the beginning of a certain function in the mm path, page is replaced with folio. All internal implementations use folio, forming a "closure". Then modify its caller function and call the function with folio as a parameter. Until all caller functions have been modified, then this "closure" has been expanded to another level. Some functions have many callers and cannot be modified at once. Folio-5.16 provides a layer of wrapper. Here we take page_mapping/folio_mapping as an example.

First, the closure contains infrastructure such as folio_test_slab(folio), folio_test_swapcache(folio), and then expands upward to folio_mapping. There are many callers of page_mapping. mem_cgroup_move_account can call folio_mapping smoothly, but page_evictable still uses page_mapping. Then the closure stops expanding here.

struct address_space *folio_mapping(struct folio *folio)
{
        struct address_space *mapping;
        /* This happens if someone calls flush_dcache_page on slab page */
        if (unlikely(folio_test_slab(folio)))
                return NULL;
        if (unlikely(folio_test_swapcache(folio)))
                return swap_address_space(folio_swap_entry(folio));
        mapping = folio->mapping;
        if ((unsigned long)mapping & PAGE_MAPPING_ANON)
                return NULL;
        return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
}
struct address_space *page_mapping(struct page *page)
{
        return folio_mapping(page_folio(page));
}
mem_cgroup_move_account(page, ...) {
  folio = page_folio(page);
  mapping = folio_mapping(folio);
}
page_evictable(page, ...) {
  ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
}

2. Is that all folio?

Many friends have the same feeling as me when they see this: Is that all? Is it just a compound_head problem? I had to learn LWN: A discussion on folios ( https://lwn.net/Articles/869942/ ) , LPC 2021 - File Systems MC ( https://www.youtube.com/watch?v=U6HYrd85hQ8&t=1475s )  Boss’s discussion about folio. Then I discovered that Matthew Wilcox's theme was not "The folio", but "Efficient buffered I/O". Things are not simple.

This time folio-5.16 incorporates all fs-related codes. The boss in the group mentioned that "the bosses in the Linux-mm community do not agree to replace all pages with folio. As for anonymous pages and slabs, they still cannot be replaced in the short term." So I continued browsing the Linux-mm mailing list.

2.1 Community discussion of folio

2.1.1 Naming

The first is Linus. Linus said that he doesn't hate this set of patches, because this set of patches does solve the problem of compound_head; but he also doesn't like this set of patches because folio sounds unintuitive. After several discussions about the name, of course the final name was folio.

2.1.2 Opinions of FS developers

Currently, there are 4K pages in the page cache, and the large pages in the page cache are also read-only, such as the code large page ( https://openanolis.cn/sig/Cloud-Kernel/doc/475049355931222178 ) feature. Why Transparent huge pages in the page cache has not been implemented? You can refer to this LWN ( https://lwn.net/Articles/686690/ ) . One of the reasons is that to implement reading and writing file THP, the processing of page cache by fs based on buffer_head is too complicated.

  • buffer_head
    buffer_head represents the block device offset position of the physical memory map. Generally, a buffer_head is also 4K in size, so a buffer_head exactly corresponds to a page. Some file systems may use a smaller block size, such as 1K, or 512 bytes. Such a page can have up to 4 or 8 buffer_head structures to describe the physical disk location corresponding to its memory.
    In this way, when dealing with multi-page reading and writing, each page needs to obtain the relationship between the page and the disk offset through get_block, which is inefficient and complicated.
  • iomap
    iomap was originally taken from XFS, based on extents and naturally supports multi-page. That is, when dealing with multi-page reading and writing, the relationship between all pages and disk offsets can be obtained with only one translation.
    Through iomap, the file system is isolated from the page cache. For example, they both use bytes when expressing size, rather than how many pages there are. Therefore, Matthew Wilcox suggests that any file system that uses page cache directly should consider switching to iomap or netfs_lib.
    There may be more ways to isolate fs and page cache than folio, but scatter gather, for example, is not acceptable because the abstraction is too complex.

This is why folio was first implemented in XFS/AFS, because these two file systems are based on iomap. This is why FS developers strongly hope that folio will be incorporated. They can easily use larger pages in the page cache. This approach can make the I/O of the file system more efficient. buffer_head has some features that current iomap still lacks. The integration of folio can advance iomap, allowing block-based file systems to be changed to use iomap.

2.1.3 Opinions from MM developers

The biggest objection came from Johannes Weiner. He acknowledged the problem of compound_head, but felt that it was not worth introducing such a big change to fix the problem. He also believed that the optimization of folio on fs was not needed by anonymous page.

Unlike the filesystem side, this seems like a lot of churn for very little tangible value. And leaves us with an end result that nobody appears to be terribly excited about.But the folio abstraction is too low-level to use JUST for file cache and NOT for anon. It's too close to the page layer itself and would duplicate too much of it to be maintainable side by side.

In the end, with the strong support of folio from Kirill A. Shutemov, Michal Hocko and other big guys, Johannes Weiner also compromised.

2.1.4 Reach consensus

At the end of the community discussion, the objections against folio no longer existed in the code of folio-5.15, but the merge window of 5.15 was missed, so this time folio-5.16 was merged intact.

2.2 The deep value of folio

I think the problem with folio is that everybody wants to read in her/his hopes and dreams into it and gets disappointed when see their somewhat related problem doesn't get magically fixed with folio.
Folio started as a way to relief pain from dealing with compound pages. It provides an unified view on base pages and compound pages. That's it.
It is required ground work for wider adoption of compound pages in page cache. But it also will be useful for anon THP and hugetlb.
Based on adoption rate and resulting code, the new abstraction has nice downstream effects. It may be suitable for more than it was intended for initially. That's great.
But if it doesn't solve your problem... well, sorry...
The patchset makes a nice step forward and cuts back on mess I created on the way to huge-tmpfs.
I would be glad to see the patchset upstream.
--Kirill A. Shutemov

Everyone knows about the "struct page related confusion", but no one has solved it. Everyone has been silently enduring this long-term trouble, and the code is full of the following code.

if (compound_head(page)) // do A;
else                     // do B;

Folio is not perfect, perhaps because everyone's expectations are too high, causing a few people to express disappointment with the final implementation of folio. But most see the folio as an important step in the right direction . After all, there is more work to be done in the future.

3. Folio follow-up work and others

3.1 folio development plan

For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready.

3.2 folio can also improve performance

The 80% win is real, but appears to be an artificial benchmark (postgres startup, which isn't a serious workload). Real workloads (eg building the kernel, running postgres in a steady state, etc) seem to benefit between 0-10%.

folio-5.16 reduces a large number of compound_head calls, and should have performance improvements in micro benchmarks with high sys. Not tested. After folio-5.18 multi-page folios is supported, theoretically the I/O efficiency can be improved, so we will wait and see.

3.3 How should I use folio?

The most important thing FS developers should do is to convert those file systems that still use buffer heads to use iomap for I/O, at least for those block-based file systems. Other developers can readily accept folio. If new features developed based on 5.16+ can use folio, just use folio. Just get familiar with the API. The essence of the API such as memory allocation and recycling has not changed.

Click to try the cloud product for free now to start your practical journey on the cloud!

Original link

This article is original content from Alibaba Cloud and may not be reproduced without permission.

The web version of Windows 12 deepin-IDE compiled by junior high school students was officially unveiled. It is known as "truly independently developed" QQ has achieved "three-terminal simultaneous updates", and the underlying NT architecture is based on Electron QQ for Linux officially released 3.2.0 "Father of Hongmeng" Wang Chenglu : Hongmeng PC version system will be launched next year to challenge ChatGPT, these 8 domestic AI large model products GitUI v0.24.0 are released, the default wallpaper of Ubuntu 23.10, a Git terminal written in Rust, is revealed, the "Tauren" in the maze JetBrains announces the WebStorm 2023.3 roadmap China Human Java Ecosystem, Solon v2.5.3 released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunqi/blog/10108651