Linux Performance Optimization--Performance Tools: System Memory

3.0.Overview

This chapter provides an overview of system-level Linux memory performance tools. This chapter discusses the memory statistics that these tools can measure and how to collect these statistics using various tools. After reading this chapter, you will be able to:

  1. Understand basic metrics of system-level performance, including memory usage.
  2. Understand which tools can retrieve these system-level performance metrics.

3.1 Memory performance statistics

Each system-level Linux performance tool provides a different way to extract similar statistical results. Although no tool can display all the information, some tools display the same statistical information. The details of these statistics are described at the beginning of this chapter and are referenced later when describing the tools.

3.1.1 Memory subsystem and performance

In modern processors, saving information to or reading information from the memory subsystem generally takes longer than the CPU executing code or processing information. Typically, before the CPU executes instructions or processes data, it spends considerable idle time waiting to fetch instructions and data from memory. Processors use different levels of cache to compensate for this slow memory performance. Tools, such as oprofile, can show where various processor cache misses occur.

3.1.2 Memory subsystem (virtual memory)

Any given Linux system has a certain amount of RAM or physical memory. When addressing within this physical memory, Linux divides it into blocks or "pages" of memory. When allocating or transferring memory, the unit of Linux operations is pages, not individual bytes. When reporting some memory statistics, the Linux kernel reports the number of pages per second, which can change depending on the architecture it is running on. Listing 3.1 creates a small application that displays the number of bytes for each page in the current architecture.
Insert image description here
For the IA32 architecture, the page size is 4KB. In rare cases, these page-sized memory blocks can cause extremely high tracking overhead, so the kernel operates the memory in larger blocks, called HugePages. They have a capacity of 2048KB instead of 4KB, which greatly reduces the overhead of managing huge memory. Some applications, such as Oracle, use these huge pages to load large amounts of data in memory while minimizing the management overhead of the Linux kernel. If the HugePage cannot be completely filled, a considerable amount of memory will be wasted. A half-filled normal page wastes 2KB of memory, while a half-filled HugePage wastes 1024KB of memory.

The Linux kernel can scatter and collect these physical pages to present a carefully designed virtual memory space to applications.

3.1.2.1 Swapping (Insufficient physical memory)

The physical memory capacity of all system RAM chips is fixed. Even if applications require more memory than available physical memory, the Linux kernel still allows these programs to run. The Linux kernel uses the hard disk as temporary storage, and this hard disk space is called the swap space.

Although swapping is an excellent way to keep processes running, it is painfully slow. Applications using swap can be up to a thousand times slower than using physical memory. If the system is performing poorly, it is often useful to determine how much swap the system is using.

3.1.2.2 Buffer and cache (too much physical memory)

On the contrary, if the physical memory capacity of your system exceeds the needs of the application, Linux will cache recently used files in physical memory so that subsequent access to these files does not require access to the hard disk. This can significantly speed up applications that access the hard drive frequently, and is obviously particularly useful for applications that are launched frequently. When the application first starts, it needs to read from the hard disk; however, if the application remains in the cache, it needs to read from the faster physical memory. This hard disk cache is different from the processor cache mentioned in the previous chapter. With the exception of oprofile, valgrind, and kcachegrind, most tools that report statistics on "cache" are actually referring to the hard disk cache (disk cache).

In addition to cache, Linux also uses additional storage as buffers. To further optimize applications, Linux reserves storage space for data that needs to be written back to the hard disk. These reserved spaces are called buffers. If the application wants to write data back to the hard disk, which usually takes a long time, Linux allows the application to continue execution immediately but save the file data to a memory buffer. At some point later, the buffer is flushed to disk and the application can continue immediately.

The use of caches (disk cache) and buffers (memory cache) leaves the system with very little free memory, which can be frustrating, but is not necessarily a bad thing. By default, Linux tries to use as much of your memory as possible. This is a good thing. If Linux detects free memory, it caches applications and data into that memory to speed up future access. Since accessing memory is orders of magnitude faster than accessing hard disk, this can significantly improve overall performance. If the system needs the cache space for something more important, the cache space will be wiped and given to the system. After that, access to the original cached object needs to be redirected to the hard disk.

3.1.2.3 Active and inactive memory

Active memory refers to the memory currently used by the process. Inactive memory refers to memory that has been allocated but has not been used yet. There is no essential difference between these two types of memory. When needed, Linux finds the least recently used memory pages of the process and moves them from the active list to the inactive list. When it comes time to choose which memory page to swap to the hard disk, the kernel selects from the inactive memory list.

3.1.2.4 High-end and low-end memory

For 32-bit processors (such as IA32) with 1GB or more of physical memory, Linux must divide it into high-end and low-end memory when managing memory. High-end memory cannot be directly accessed by the Linux kernel, but must be mapped into the lower-end memory range before use. 64-bit processors (such as AMD64/EM6T, Alpha or Itanium) do not have this problem because they can directly address the additional memory currently available to the system.

3.1.2.5 Kernel memory usage (sharding)

In addition to applications needing to allocate memory, the Linux kernel also consumes a certain amount of memory for accounting purposes. Accounting includes, for example, tracking data from network or disk I/O and tracking which processes are running and which are sleeping. To manage accounting, the kernel has a series of caches, which contain one or more memory slices. Each shard is a set of objects, and the number can be one or more. The number of memory slices consumed by the kernel depends on which parts of the Linux kernel are used, and can also vary with the type of load on the machine.

3.2 Linux performance tools: CPU and memory

Let's now discuss performance tools that allow you to extract the memory performance information described earlier.

3.2.1 vmstat(I)

As seen before, vmstat can provide many different aspects of system performance information - although its main purpose (as shown below) is to provide virtual memory system information. In addition to the CPU performance statistics described in the previous chapter, it can also tell you the following information:
1. How many swap partitions are used.
2. How physical memory is used.
3. How much free memory is there.
As you can see, vmstat (through the statistics it displays) provides a wealth of information about system health and performance in a single line of text.

3.2.1.1 System-wide memory-related system-level options

In addition to providing CPU statistics, vmstat can also investigate memory statistics by calling vmstat from the following command line: vmstat [-a] [·S] [-m] As before, you can run vmstat in two modes
: Sampling mode and averaging mode. Adding command line options allows you to obtain performance statistics on memory usage by the Linux kernel. Table 3-1 shows the acceptable options for vmstat.
Insert image description here
Table 3-2 shows the memory statistics that vmstat can provide. As with CPU statistics, when running in normal mode, the first line of information provided by vmstat is the mean of all rate statistics (so and si) and the instantaneous value of all numeric statistics (swpd, free, buff, cache, active and inactive).
Insert image description here
Insert image description here
For a given machine, vmstat can provide a good overview of the current state of its virtual storage system. While it doesn't provide a complete and detailed list of every available Linux performance statistic, the concise output it gives can indicate how system memory is being used overall.

3.2.1.2 Usage examples

As seen in the previous chapter, in Listing 3.2, if vmstat is called without any command line options, it displays the average performance statistics (si and so) starting from the system startup, and the instantaneous values ​​of other statistics (swpd , free, buff and cache). In this example, we can see that the system has about 500MB of memory swapped to the hard drive. About 14MB of system memory is free. About 4MB is used for buffers to hold data that has not yet been flushed to the hard drive. About 627MB is used for hard disk cache to save data read from the hard disk in the past.
Insert image description here
Insert image description here
In Listing 3.3, we ask vmstat to display information about the number of active and inactive pages. The number of inactive pages indicates how much memory can be swapped to the hard disk and how much memory is currently available. In this example, we can see that there is 1310MB of active memory, and only 78MB is considered inactive. The machine has a lot of memory, most of which is used and active.
Insert image description here
Insert image description here
Next, in Listing 3.4, we see a different system with frequent memory swapping. Column si shows that during each sampling period, the read exchange rate of data is 480KB, 832KB, 764KB, 344KB and 512KB respectively. The so column shows that during each sampling period, the memory data write exchange rates are 9KB, 0KB, 916KB, 0KB, 1068KB, 444KB and 792KB respectively. These results may indicate that the system does not have enough memory to handle all running processes. High frequency swapping in and out occurs when a process's memory is saved to make room for an application that has previously been swapped to the hard disk. If two running programs require more memory than the system can provide, the consequences can be dire. For example, if two processes are both using a lot of memory and they are both trying to run at the same time, each process can cause the other's memory to be swapped. When one program needs a block of memory, it kicks out a block of memory that another program needs. When another application starts running, it will kick out a block of memory being used by the first program and wait for its own memory block to be loaded from the swap partition. This can cause both applications to stall, waiting for their memory to be retrieved from the swap partition before they can continue. Whenever a program makes a little progress, it swaps out the memory used by another process, causing the program to slow down. This condition is called turbulence. When thrashing occurs, the system will spend a lot of time reading memory or writing to the swap partition, and system performance will drop sharply.
In this case, swapping eventually stopped, most likely because the memory swapped to the hard disk was not immediately needed by the first process. This means that swapping is effective, and the contents of memory that are not in use are written to the hard disk, and then the memory is allocated to the process that needs it.
Insert image description here
Insert image description here
Listing 3.5 was given in the previous chapter, and as it shows, vmstat can display many different system statistics. Now when we look at it, we can see that some of the same statistics are presented in different output modes, such as active, inactive, buffer, cache and used swap. However, some new statistical information has also appeared, such as total memory, which indicates that the system has a total of 1516MB of memory; total swap, which indicates that the system has a total of 2048MB of swap partitions. When trying to determine the swap partition and the percentage of memory currently being used, it's helpful to know the system totals. Another interesting statistic is pages paged in, which represents the total number of pages read in from the hard drive. This statistic includes pages read by starting the application, as well as pages available to the application itself.
Insert image description here
Finally, in Listing 3.6, we see that vmstat can provide information about how the Linux kernel allocates its memory. As mentioned before, the Linux kernel has a series of "shards" to hold its dynamic data structures. vmstat displays each fragment (Cache), showing how many elements are used (Num), how many are allocated (Total), the size of each element (Size), and how many memory pages (Pages) are used by the entire fragment. This information helps track exactly how the kernel is using its memory.
Insert image description here
vmstat provides an easy way to extract a large amount of Linux memory subsystem information. Combined with other information on the default output interface, it displays a picture of system health and resource usage.

3.2.2 top(2.x and 3.x)

As discussed in the previous chapter, top can also give system-level or process-specific performance statistics. By default, top displays a list sorted by process CPU consumption in descending order, but it can also be adjusted to sort by total memory usage so that you can track which process is using the most memory.

3.2.2.1 Options related to memory performance

top does not use any specific command line options to control its display of memory statistics. Its calling command line is as follows: top
However, once running, top allows you to choose to display system-level memory information or display processes sorted by memory usage. Sorting by memory consumption can prove to be very helpful in determining which process consumes the most memory. Table 3-3 illustrates the different memory-related switches.
Insert image description here
Table 3-4 shows the memory performance statistics of the entire system and individual processes that top can provide. There are two different versions of top, 2x and 3.x, with slight differences in the names of the output statistics. Table 3-4 describes the names of both versions.
Insert image description here
Insert image description here
top provides extensive memory information for different running processes. As discussed in subsequent chapters, you can use this information to determine exactly how your application allocates and uses memory.

3.2.2.2 Usage examples

Listing 3.7 is similar to the top running example given in the previous chapter. However, in this example, please note that approximately 84MB is free in the buffer, and the total physical memory capacity is 1024MB.
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/x13262608581/article/details/133524252