How physical memory is organized and managed

Memory management, compared to everyone heard. But what exactly does memory management do? This has to start when the computer comes out. When the computer first came out, the memory resources were very tight, only a few tens of K, then slowly to a few hundred K, to 512M after the week, and now to a few G. It is really because of the lack of memory resources that a variety of memory management methods have been derived throughout the computer.

The ultimate goal of memory management is to use physical memory without waste . Linux has designed a variety of memory management methods for how to properly use physical memory. Today we will discuss how Linux organizes physical memory. In layman's terms, how to manage the computer's memory stick.

Linux uses a three-level structure of node, zone, and page to describe the entire physical memory.

 

node

There are currently two architectures for computer systems:

  • Non-uniform memory access (NUMA) means that the memory is divided into nodes, and the time it takes to access a node depends on the distance between the CPU and the node. Each CPU has a local node, and the time to access the local node is faster than that of other nodes
  • Uniform Memory Access (UMA) can also be called SMP (Symmetric Multi-Process) symmetric multiprocessor. This means that all processors spend the same amount of time accessing memory. It can also be understood that there is only one node in the entire memory.
  • NUMA is usually used in the field of server, you can configure whether to open through CONFIG_NUMA

zone

ZONE means that the entire physical memory is divided into several areas, and each area has a special meaning.

First look at the definition of zone in the kernel

enum zone_type {
#ifdef CONFIG_ZONE_DMA
	/*
	 * ZONE_DMA is used when there are devices that are not able
	 * to do DMA to all of addressable memory (ZONE_NORMAL). Then we
	 * carve out the portion of memory that is needed for these devices.
	 * The range is arch specific.
	 *
	 * Some examples
	 *
	 * Architecture		Limit
	 * ---------------------------
	 * parisc, ia64, sparc	<4G
	 * s390			<2G
	 * arm			Various
	 * alpha		Unlimited or 0-16MB.
	 *
	 * i386, x86_64 and multiple other arches
	 * 			<16M.
	 */
	ZONE_DMA,
#endif
#ifdef CONFIG_ZONE_DMA32
	/*
	 * x86_64 needs two ZONE_DMAs because it supports devices that are
	 * only able to do DMA to the lower 16M but also 32 bit devices that
	 * can only do DMA areas below 4G.
	 */
	ZONE_DMA32,
#endif
	/*
	 * Normal addressable memory is in ZONE_NORMAL. DMA operations can be
	 * performed on pages in ZONE_NORMAL if the DMA devices support
	 * transfers to all addressable memory.
	 */
	ZONE_NORMAL,
#ifdef CONFIG_HIGHMEM
	/*
	 * A memory area that is only addressable by the kernel through
	 * mapping portions into its own address space. This is for example
	 * used by i386 to allow the kernel to address the memory beyond
	 * 900MB. The kernel will set up special mappings (page
	 * table entries on i386) for each page that the kernel needs to
	 * access.
	 */
	ZONE_HIGHMEM,
#endif
	ZONE_MOVABLE,
#ifdef CONFIG_ZONE_DEVICE
	ZONE_DEVICE,
#endif
	__MAX_NR_ZONES

};

In order to better explain the meaning of each ZONE, such as the above picture.

 

32-bit system:

In a 32-bit system, assume that our physical memory is 4G.

  • DMA_ZONE is because under the X86 architecture, some DMA devices can only access addresses below 16M, so the DMA_ZONE is designed. When the DMA device accesses memory, the memory is obtained from DMA_ZONE
  • HIGHMEM_ZONE: HIGHMEM_ZONE is the product of the 32-bit era. The reason for this is that the virtual address space of 4G in a 32-bit system is divided into 0-3G for user space and 3-4G for kernel space. In order to facilitate the operation of the kernel, the physical address and the virtual space of the kernel need to be linearly mapped, and because the kernel only has 1G space and the physical memory has 4G, it cannot be linearly mapped at all. At this time, the address of the kernel 3G-3G + 896M is linearly mapped to the area of ​​physical memory 0-896M. The unmapped area of ​​896-4G is called highmem_zone. Here 896 is the value of the classic x86 architecture, the value of the arm architecture is not studied.
  • NORAML_ZONE: The area of ​​16M-896M is called NORAML_ZONE.
  • Generally, the memory area of ​​HIGHMEM_ZONE is called high-end memory, and the memory below 896M is called low-end memory, and the low-end memory is linearly mapped.

You can look at my 32-bit ubuntu machine, there is Noraml zone, DMA zone, HighMem zone.

64-bit system

  • On 64-bit systems, the virtual address space is large enough. For example, when the number of digits of the address width is 39. User space and kernel space are the same size, the size is 512G.
  • Assuming that the physical memory is 4G at this time, the entire 4G can be mapped to the kernel virtual address range. Therefore, HIGHMEM_ZONE no longer exists on 64-bit machines.
  • On the x86 64-bit machine, there may be DMA, DMA_32 area, used for DMA transfer.
  • For example, my ubuntu machine, you can see the specific zone information through / proc / buddinfo

root@root-OptiPlex-7060:~$ cat /proc/buddyinfo 
Node 0, zone      DMA      3      3      1      1      3      2      0      0      1      1      3 
Node 0, zone    DMA32   4053    729    155    166    105     43    151      0      0      0      0 
Node 0, zone   Normal  33893   8921   6356   1472   1221    101     48     10      4      0      0 

As another example, take a look at one of my ARm64 mobile phones.

root:/ # cat /proc/buddyinfo
Node 0, zone   Normal     12      7    148     52    114     39     16      8      5      5    117
Node 0, zone  Movable    470   1135    880    340     35      8      4      2      3      0    653

You can see that HIGHMEM_ZONE no longer exists on 64-bit machines. Only one NORAML_ZONE left

ZONE_MOVABLE: Used for memory fragmentation technology, which means that when the memory is fragmented, in order to adjust a large continuous memory, you need to exchange the contents of Moveablezone to swap out a large continuous memory area.

 

page

It represents a physical page, and a physical page is represented in the kernel with a struct page.

struct page {
{
	unsigned long flags;		/* Atomic flags, some possibly
					 * updated asynchronously */
	/*
	 * Five words (20/40 bytes) are available in this union.
	 * WARNING: bit 0 of the first word is used for PageTail(). That
	 * means the other users of this union MUST NOT use the bit to
	 * avoid collision and false-positive PageTail().
	 */
	union {
		struct {	/* Page cache and anonymous pages */
			/**
			 * @lru: Pageout list, eg. active_list protected by
			 * zone_lru_lock.  Sometimes used as a generic list
			 * by the page owner.
			 */
			struct list_head lru;
			/* See page-flags.h for PAGE_MAPPING_FLAGS */
			struct address_space *mapping;
			pgoff_t index;		/* Our offset within mapping. */
			/**
			 * @private: Mapping-private opaque data.
			 * Usually used for buffer_heads if PagePrivate.
			 * Used for swp_entry_t if PageSwapCache.
			 * Indicates order in the buddy system if PageBuddy.
			 */
			unsigned long private;
		};
		struct {	/* slab, slob and slub */
			union {
				struct list_head slab_list;	/* uses lru */
				struct {	/* Partial pages */
					struct page *next;
#ifdef CONFIG_64BIT
					int pages;	/* Nr of pages left */
					int pobjects;	/* Approximate count */
#else
					short int pages;
					short int pobjects;
#endif
				};
			};
			struct kmem_cache *slab_cache; /* not slob */
			/* Double-word boundary */
			void *freelist;		/* first free object */
			union {
				void *s_mem;	/* slab: first object */
				unsigned long counters;		/* SLUB */
				struct {			/* SLUB */
					unsigned inuse:16;
					unsigned objects:15;
					unsigned frozen:1;
				};
			};
		};
		struct {	/* Tail pages of compound page */
			unsigned long compound_head;	/* Bit zero is set */

			/* First tail page only */
			unsigned char compound_dtor;
			unsigned char compound_order;
			atomic_t compound_mapcount;
		};
		struct {	/* Second tail page of compound page */
			unsigned long _compound_pad_1;	/* compound_head */
			unsigned long _compound_pad_2;
			struct list_head deferred_list;
		};
		struct {	/* Page table pages */
			unsigned long _pt_pad_1;	/* compound_head */
			pgtable_t pmd_huge_pte; /* protected by page->ptl */
			unsigned long _pt_pad_2;	/* mapping */
			union {
				struct mm_struct *pt_mm; /* x86 pgds only */
				atomic_t pt_frag_refcount; /* powerpc */
			};
#if ALLOC_SPLIT_PTLOCKS
			spinlock_t *ptl;
#else
			spinlock_t ptl;
#endif
		};
		struct {	/* ZONE_DEVICE pages */
			/** @pgmap: Points to the hosting device page map. */
			struct dev_pagemap *pgmap;
			unsigned long hmm_data;
			unsigned long _zd_pad_1;	/* uses mapping */
		};

		/** @rcu_head: You can use this to free a page by RCU. */
		struct rcu_head rcu_head;
	};

	union {		/* This union is 4 bytes in size. */
		/*
		 * If the page can be mapped to userspace, encodes the number
		 * of times this page is referenced by a page table.
		 */
		atomic_t _mapcount;

		/*
		 * If the page is neither PageSlab nor mappable to userspace,
		 * the value stored here may help determine what this page
		 * is used for.  See page-flags.h for a list of page types
		 * which are currently stored here.
		 */
		unsigned int page_type;

		unsigned int active;		/* SLAB */
		int units;			/* SLOB */
	};

	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
	atomic_t _refcount;

#ifdef CONFIG_MEMCG
	struct mem_cgroup *mem_cgroup;
#endif

	/*
	 * On machines where all RAM is mapped into kernel address space,
	 * we can simply calculate the virtual address. On machines with
	 * highmem some memory is mapped into kernel virtual memory
	 * dynamically, so we need a place to store that address.
	 * Note that this field could be 16 bits on x86 ... ;)
	 *
	 * Architectures with slow multiplication can define
	 * WANT_PAGE_VIRTUAL in asm/page.h
	 */
#if defined(WANT_PAGE_VIRTUAL)
	void *virtual;			/* Kernel virtual address (NULL if
					   not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */

#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
	int _last_cpupid;
#endif
}

It can be seen that the struct page structure is basically a union, just to save space. Because there are many physical pages, many pages are needed to represent the physical pages, and pages require memory. So the page structure adopts the structure of consortium to organize. But the readability is poor.

 

Page Frame

To describe a physical page, the kernel uses a struct page structure to represent a physical page. Assuming that the size of a page is 4K, the kernel will divide the entire physical memory into a 4K physical page, and the area of ​​the 4K physical page is called page frame

Page Frame Num(PFN)

The physical address is divided into blocks of a size, for example, if the size is 4K, the area of ​​a physical page is called a page frame, and the number of each page frame is called PFN.

The relationship between physical address and pfn is: physical address >> PAGE_SHIFT = pfn

 

The relationship between pfn and page:

Several memory models are supported in the kernel: CONFIG_FLATMEM (flat memory model) CONFIG_DISCONTIGMEM (discontinuous memory model) CONFIG_SPARSEMEM_VMEMMAP (sparse memory model) the sparse type mode currently used by ARM64

/* memmap is virtually contiguous.  */
#define __pfn_to_page(pfn)	(vmemmap + (pfn))
#define __page_to_pfn(page)	(unsigned long)((page) - vmemmap)

When the system is started, the kernel will map the entire struct page to the vmemmap area of ​​the kernel virtual address space, so we can simply think that the base address of the struct page is vmemmap, then:

The address of vmemmap + pfn is the address corresponding to this struct page.

 

to sum up:

A physical memory is divided into several nodes, each node has several zones, and each zone is subdivided into page sizes.

 

 

Published 187 original articles · won 108 · 370,000 views

Guess you like

Origin blog.csdn.net/longwang155069/article/details/105428538