Introduction to performance monitoring tools under Linux

Should develop the habit of using these tools. Of course you need to know how to diagnose performance problems, but you should also regularly look for changes in key indicators that may indicate problems. You can use these tools to measure the performance impact of a new application. Just like checking the temperature gauge in the car, pay attention to the performance indicators of the Linux system. The tools introduced in this chapter are:
· top
· sar
· vmstat
· iostat
· free
    You can run these tools as a normal user. They all use the /proc file system to get their data. These performance tools are provided with a few rpms. procps rpm provides top, free and vmstat. sysstat rpm provides sar and iostat.
    The top command is an excellent interactive utility for monitoring performance. It provides several summary lines about overall Linux performance, but reporting process information is the real strength of top. You can customize the process display extensively, you can also add fields, sort the process list according to different indicators, and even log off the process from top.
    The sar utility provides the ability to monitor every event. It has at least 15 separate report categories, including CPU, disk, network, process, swap area, etc.
    The vmstat command reports extensive information about memory and swap area usage. It also reports CPU and some I/O information. The iostat report stores input/output (I/O) statistics.
    These commands cover many of the same places. This section discusses how to use these commands and explains the reports generated by each command. It does not discuss all 15 sar syntaxes, but introduces the most common ones.
3.1 top
    The top command is one of the most popular performance tools. Most system administrators run top to view the running status of Linux and UNIX systems. The top utility provides an ideal way to monitor the process and overall performance of Linux. It is more accurate to call Linux processes tasks, but in this chapter we will still call them processes because this tool also calls them that way. 1 Ordinary users and root users can run top. Figure 3-1 shows a typical top output of an idle system.
Figure 3-1 top output
    top display has two parts. Approximately the first third shows overall information about Linux, and the rest are information about individual processes. If the window is expanded, more processes are displayed to fill the entire screen.
    More comprehensive Linux information can be obtained by using several commands other than top. However, it is ideal to display all the information on one screen with one command. The first line shows the load average of the last 1 minute, 5 minutes, and 15 minutes. The load average indicates how many processes are running or waiting to run on the CPU. The uptime command can also be used to display the average load. Next is the process information, followed by the CPU, memory, and swap area. The memory and swap area information is similar to the output of the free command. After we determine the memory and CPU usage, the next question is which processes are using it.
    Most process information can also be obtained through the ps command, but top provides a format that is easier to read. The most useful is h for help, which lists other interactive commands of top.
3.1.1 Adding and removing fields
    Fields can be added or removed from the display. Process output can be sorted by CPU, memory, or other indicators. This is an ideal way to see what process is grabbing memory. The top syntax and interactive options of each Linux distribution are different, and the help command can quickly list what commands are available. There are many interactive options available, and users should spend some time experimenting with them.
    Figure 3-2 shows the help screen for Red Hat Enterprise Linux ES release 3.
Figure 3-2 The help screen of top The
    f command is used to add or remove fields from the top output. Figure 3-3 is a help screen for Red Hat Enterprise Linux ES release 3, showing what fields can be added.
Figure 3-3 Screen for adding/removing fields from top
    Figure 3-4 shows a help screen for SUSE Linux 9.0 top. It can be seen that the commands they provide are very different.
Figure 3-4 Help screen of SUSE top
3.1.2 Explanation output
    Let us study the meaning of top information, take the following output of top as an example:
    The first line of top output shows load average information:
    this output is similar to the output of uptime. You can see how long Linux has been running, the current time and the number of users, as well as the average load of 1 minute, 5 minutes, and 15 minutes. Next, the process summary is shown:
    we see a total of 73 processes, of which 72 processes are sleeping, one process is running, and there are no rigid processes or stopped processes. When a process exits and its parent process does not wait for it through the wait(2) or waitpid(2) function, it becomes a rigid process. This is usually caused by the parent process exiting before its child process. Unlike the entries in the process table, rigid processes do not use resources. A stopped process is a process that has sent a STOP signal to it. For more information, see the signal(7) man page.
    Next is the CPU information:
    The CPU row describes how the CPU uses their CPU cycles. The top command reports the percentage of time the CPU spends in user or kernel mode, well-running processes, and idle state. The iowait column shows the percentage of time that the processor waits for I/O to complete when no process is running on the CPU. The irq and softirq columns represent the time spent processing hardware and software interrupts. Linux kernels earlier than 2.6 do not report irq, softirq, and iowait.
    Next is the memory information: the
    first three indicators provide a summary of memory usage, listing the total available memory, used memory, and free memory, all of which are information needed to determine whether Linux memory is sufficient.
    The next five indicators identify how the used memory is allocated. The shrd field shows the shared memory usage, and the buff is the memory used by the buffer. The memory allocated to the kernel or user process can be in three different states: active, inactive dirty, and inactive clean. The activity is represented by aotv in the top, which means that the memory has been used recently. Inactive dirty is represented by in_d in top, which means that the memory has not been used recently and can be recycled. To reclaim the memory, its contents must be written to disk. This process is named "cleaning", which can also be called the fourth temporary state of memory. Once cleaned, inactive dirty memory becomes inactive clean memory, which is represented by in_c in top. Understanding Virtual Memory in Red Hat Enterprise Linux 3, co-authored by Norm Murray and Neil Horman, is an excellent reference book at http:people.redhat.com/nhorman/papers/rhel3_vm.pdf.
    Next is the swap area information: the
    av field is the total amount of swap area available, followed by the used quantity and free quantity, and finally the amount of memory used by the kernel for caching.
    The rest of the top display is process information:
    top displays as many processes as possible that fit the screen. The description of the field descriptions is provided in the top(1) man page. Table 3-1 provides a summary of these fields.
Table 3-1                                                     
                  Top process fields Field
description
PID process id number
USER user name of process owner
PRI process priority
SIZE process size, including its code, stack and data area,
memory used by RSS process in kilobytes The total amount, in kilobytes.
SHARE The amount of shared memory used by the
process. STAT The state of the process. Usually R means running, S means sleeping.
%CPU Since the most recent screen update, the percentage of CPU used by
this process%MEM This process has used Memory percentage
TIME Since the process started, the amount of CPU time used by this process.
CPU The most recently executed process CPU
COMMAND The command being executed
3.1.3 Save customization
    A very good top function is to save the current configuration. You can use the interactive command s to change the display arbitrarily, and then press w to save the view.
    top writes a .toprc file in the user's home directory to save the configuration so that the next time the user starts top, the same display options will be used.
    top also looks for the default configuration file /etc/toprc. This is a global configuration file. When any user runs the utility, top will read the file. This file can make top run in safe mode, and you can also set a refresh delay. Safe mode prevents non-root users from logging off or changing the normal value of the process, and also prevents non-root users from changing the refresh value of top. A sample file of /etc/toprc for Red Hat Enterprise Linux ES release 3 is as follows:
    s means safe mode, 3 defines a three-second refresh interval. Other releases may have a different /etc/toprc format. The logout process is a very useful feature. If the user has an out-of-control process, it can be easily found and deregistered through the top command. The specific steps are: run top, display all processes of the user through the u command, and then use k to deregister it. Top is not just an excellent performance monitoring tool, it can also be used to improve performance by logging off the processes that cause problems.
3.1.4 Batch mode
    top can also run in batch mode. Try running the following command:
    -n 1 tells top to display only one iteration, -b option means output in text form suitable for writing to a file or directed to another program (such as less). A command similar to the following two-line script can successfully complete the cron job:
    you can add it to crontab, and collect the output every 15 minutes.
    All tasks can be easily completed through batch processing without user intervention. All processes are listed, and the output is not refreshed every 5 seconds. If there is a .toprc configuration file in the user's home directory, then it is used to format the display. The following is the output of running top batch mode on a multi-CPU Linux server. Note that all 258 processes output by top are not displayed.
    Now readers may understand why top is so popular. The interactive nature of top and the ability to easily customize the output make it an excellent tool for diagnosing problems.
 
3.2 sar
    sar is an excellent general performance monitoring tool, it can output the data of almost all the work done by Linux. The sar command is provided in sysetat rpm. The example uses sysstat version 5.0.5, which is one of the latest stable versions. For version and download information, please visit the sysstat homepage http://perso.wanadoo.fr/sebastien.godard/.
    sar can display performance data such as CPU, run queue, disk I/O, paging (swap area), memory, CPU interruption, and network. The most important sar function is to create data files. Every Linux system should collect sar data through cron work. The sar data file provides historical performance information for system administrators. This feature is very important, it distinguishes sar from other performance tools. If the batch processing job runs normally twice in a night, you won't find this situation until the next morning (unless you are woken up). We need to have the ability to study performance data 12 hours ago. The sar data collector provides this capability. There are many report syntaxes, we first discuss data collection.
3.2.1 sar data collector
    sar data collection is completed by a binary executable file and two scripts in /usr/lib/sa. The sar data collector is a binary executable file located in /usr/lib/sa/sadc. Sadc's job is to write the data collection file /var/1og/sa/. Several options can be provided for sadc. The common syntax is:
    Interval is the number of seconds between samples, iterations is the number of samples to be obtained, and file name defines the output file. The simple sadc syntax is /usr/lib/sa/sadc 360 5/tmp/sadc.out. This command takes 5 samples at 5 minute intervals and saves them in /tmp/sadc.out. We should collect samples regularly, so we need a script to be run by cron. The sample should be placed in a meaningful place, as when using the top script in the previous section. sysstat rpm provides the /usr/lib/sa/sa1 script to accomplish all these things.
    The sa1(8) man page is much longer than the sa1 script itself. /usr/lib/sa/sa1 is a very simple script, use the syntax sadc -F -L 1 1 /var/log/sa/sa## to run sadc, where ## is the date of a certain month. Older versions of sa1 use the output of date+.%Y_%m_%d as the file suffix. If necessary, you can use the -F option to force sadc to create an output file. -L locks the output file before writing it to prevent damage to the file when two sadc processes are running at the same time. Older versions of sadc do not have the -L option, so the sa1 script performs manual locking. The options of the sa1 script are only the interval between samples and the number of sampling iterations. The cron file (/etc/cron.d/sysstat) is provided with sysstat, and it differs between sysstat versions. The following is the entry of the 5.0.5 version of sysstat: It
    can be seen that after the installation of sysstat rpm, sadc began to obtain samples. The sysstat homepage is http://perso.wanadoo.fr/ sebastien.godard/2. The document link provides the following crontab scheme similar to January 14, 2006:
    The crontab example on Sebastien Godard's website recommends taking a sample every 10 minutes from 8 am to 6 pm from Monday to Friday, and a sample every hour at other times (note that the crontab annotation is 7 pm, but it is actually 18: 00, which is 6 pm). If there is enough disk space in /var, you can sample every 10 minutes every hour and every day. If backups are slow on weekends, the hourly sadc sampling may not help much.
    Let us now study the more popular report syntax.
3.2.2 CPU statistical data
    sar -u output displays CPU information. The -u option is the default option of sar. The output shows the CPU usage in percentage. Table 3-2 explains the output.
Table 3-2                                                     
                           SAR -u field 
field Description
CPU CPU number
% user process running in user mode time spent
% nice normal operation process time spent
time% system running in kernel mode (system) processes spent
% iowait When there is no process executing on this CPU, the time that the processor waits for I/O to complete
%idle The time that no process executes on this CPU
    should look familiar. It is the same as the CPU information in the top report. The following shows the output format:
    5 10 among them causes sar to obtain 10 samples at 5 second intervals. The first column of any sar report is a timestamp.
    We could have studied the files created by sadc using the -f option. This sar syntax displays the output of sar -f/var/log/sa/sa21:
    
    In a multi-CPU Linux system, the sar command can also decompose the information for each CPU, as shown in the following sar -u -P ALL 5 5 output :
    
3.2.3 Disk I/O statistics
    sar is an excellent tool for studying disk I/O. The following is an example of sar disk I/O output.
    The first line -d displays disk I/O information, and the 5 2 options are interval and iteration, just like the sar data collector. Table 3-3 lists the fields and descriptions.
Table 3-3                                                    
                      SAR -d field
field Description
DEV disk device
tps number of transfers per second (or several IO per second)
rd_sec / S 512 to read the number of bytes per second
wr_sec / s per second 512 byte writes
    512 is just a The unit of measurement does not mean that all disk I/O uses 512-byte blocks. The DEV column is a disk device in dev#-# format, where the first # is the major number of the device, and the second # is the minor number or consecutive number. For kernels greater than 2.5, sar uses minor numbers. For example, dev3-0 and dev3-1 seen in the output of sar -d. They correspond to /dev/hda and /dev/hdal. Look at the following items in
    /dev : /dev/hda has a major number of 3 and a minor number of 0. hda1 has a major number 3 and a minor number 1.
3.2.4 Network statistics
    sar provides four different syntax options to display network information. The -n option uses four different switches: DEV, EDEV, SOCK, and FULL. DEV displays network interface information, EDEV displays statistics about network errors, SOCK displays socket information, and FULL displays all three switches. They can be used individually or together. Table 3-4 shows the fields reported through the -n DEV option.
Table 3-4                                                
                    SAR -n the DEV field
field Description
IFACE LAN interface to
the packet rxpck / s per second received
data packets txpck / s per second transmitted
bytes rxbyt / s per second received
txbyt / s per second The number of bytes sent
rxcmp/s compressed data packets received per second txcmp/s compressed data packets
sent per second
rxmcst/s multicast data packets received per second The
    following is the sar output using the -n DEV option :
    
    Information about network errors can be displayed with sar -n EDEV. Table 3-5 lists the displayed fields.
    
Table 3-5                                            
                 SAR -n EDEV field 
field Description
IFACE LAN interfaces
bad packets rxerr / s received per second
txerr/s Bad packets sent per second
coll/s Collision per second
rxdrop/s The number of received packets
discarded every second because the buffer is full txdrop/s The sent data discarded every second because the buffer is full Number of packets
txcarr/s The number of carrier errors per second when sending data packets
rxfram/s The number of frame alignment errors per second received data packets
rxfifo/s The number of received data packets per second FIFO overspeed errors
txfifo/s Data packets sent The number of FIFO overspeed errors per second
    SOCK parameter displays IPCS socket information. Table 3-6 lists the displayed fields and their meanings.
Table 3-6                                            
                  SAR -n the SOCK Field
fields Description
totsck total number of sockets used
tcpsck use TCP socket number
udpsck UDP socket number used
raw socket number used rawsck
IP segment used ip-frag Quantity

sar can generate many other reports. It is necessary for us to read the sar(1) man page carefully to see if there are other reports we need.
 
3.3 vmstat The
    vmstat command is also a way to display Linux performance indicators. It reports a lot of information, and it is difficult to understand this information.
    The output is divided into 6 categories: process, memory, swap area, I/O, system, and CPU. Similar to iostat, the first sample is the average value since the most recent restart. The following is a typical vmstat output: The
    -m option causes the memory field to be displayed in megabytes. vmstat uses sampling interval and counting parameters like many other performance commands.
    The process (procs) information has two columns. The r column is the number of runnable processes, and the b column is the number of blocked processes.
    The memory section has 4 fields that report how virtual memory is used. Table 3-7 lists these fields and their meanings.
Table 3-7                                            
                     the vmstat memory field 
field Description
swap space has a number of Swpd
free RAM free number
the number of RAM buffer BUFF used
amount of the file system cache Cache RAM using
    the next exchange (the swap) indicators. Exchange is just an old term, but obviously it will not disappear. Swapping involves all the memory consumed by processes that paging reads or writes to disk. It will show the level of performance indicators the system has reached. What Linux does is paging the disk space in small chunks as needed. Therefore, we should probably stop talking about memory swapped to disk and start talking about memory paged to disk. For either method, Table 3-8 explains the relevant fields.
Table 3-8                           
                   the vmstat exchange field 
field Description
si page from disk into memory number
so the number of memory pages from the disk
    after the exchange are two I / O fields. This section provides a brief introduction to help determine if Linux is busy doing a lot of disk I/O. vmstat only provides two fields, which show the amount of data going in and out of the disk (see Table 3-9).
Table 3-9                                                  
                    vmstat IO field
field Description
bi block from the disk read
written to disk bo block
    system field provides busy Linux kernel process management summary. See Table 3-10 for interrupts and context switches. Context switch means that the process moves out of the CPU or into the CPU.
Table 3-10                     
                  the vmstat system fields
fields Description
in system interrupt
cs process context switch
    Finally, CPU status, the percentage of the total CPU time information is represented, as shown in Table 3-11.
Table 3-11                                         
                   vmstat cpu fields
Field Description
us user mode
sy kernel mode
wa waiting for the I / O
the above mentioned id idle
 
3.4 iostat
    iostat command is another research tool disk throughput. Similar to sar, iostat can use interval and count parameters. The output of the first interval contains an indicator of the total running time of Linux. Compared with other performance commands, this may be the most unique feature of iostat. For example, the following is the output of a system that is idle most of the time. It can be seen that the hda device has read about 9 158MB (18 755 572*512/1 024/1 024) since it was started. The Blk column is a 512-byte block.
    
    Without options, iostat only displays a set of metrics covering the entire time since startup.
    The CPU information contains basically the same fields as top. The iostat CPU output shows the percentage of CPU time that is idle when executing in user mode, executing normal processes, executing in kernel (system) mode, processes waiting for I/O to complete and being idle when there is no waiting process. The CPU line is a summary of all CPUs.
    The disk information is similar to the information provided by sar -d. The output includes the number of transfers per second (tps), the number of 512-byte block reads per second (Blk_read/s), the number of 512-byte block writes per second (Blk_wrtn/s) and the 512-byte block read (Blk_read) and The total number of writes (Blk_wrtn).
    iostat provides several switches for customizing the output. The most useful ones are:
        -c displays only the CPU line
        -d displays the disk line
        -k displays the disk output in kilobytes
        -t Include a timestamp
        in the output -x Include extended disk metrics in the output
    These options can be combined. The output of iostat -tk 52 is:
    
3.5 The free
    command outputs memory and exchange information, which is very similar to the function of the top command. No options are used, free displays information in kilobytes: The
    free command has a small number of options, and -mt is recommended. The -m switch makes the output in megabytes, and the -t switch provides a total line:
3.6 Summary
    As shown in this chapter, the information provided by the performance tools available in Linux has a lot of repetition. For example, memory information can be passed through top, vmstat, Free and sar display. System administrators do not need to be proficient in all these tools. It is important to know how to find and interpret all the performance information needed rather than which tools to use. Therefore, we recommend that readers spend more time to familiarize themselves with these tools and their output.

Guess you like

Origin blog.csdn.net/m0_48368237/article/details/114199044