About system monitoring and Go language collection code

In the production environment, sometimes it is necessary to collect the performance of the system from time to time, which can prevent the occurrence of problems, such as hardware problems such as CPU, memory, and hard disk, which cause the system to crash. The following commands are available under Linux.
insert image description here

The top command of the CPU monitoring command

  1. Overview:
    The top command is a commonly used performance analysis tool under Linux. It can display the resource usage of each process in the system in real time, similar to the Windows task manager. The following details how to use it.
    top is a dynamic display process, that is, the current state can be continuously refreshed by pressing the user's key. If the command is executed in the foreground, it will monopolize the foreground until the user terminates the program. More precisely, the top command provides a real-time view of the system Processor status monitoring. It will display a list of the most CPU-sensitive tasks in the system. This command can sort tasks by CPU usage, memory usage, and execution time; and many features of this command can be accessed through interactive commands or Set in personal customization file.

  2. Command format:
    top [parameters]

  3. Command function:
    Display the relevant information of the process currently being executed by the system, including process ID, memory usage rate, CPU usage rate, etc.

  4. Command parameters:
    -b batch processing
    -c display complete governance command
    -I ignore failure process
    -s confidential mode
    -S accumulation mode
    -i<time> set interval time
    -u<user name> specify user name
    -p<process number> Specify
    the number of times the process -n<number of times> is displayed in a loop

  5. Example of use:
    01
    Example 1: Display process information:
    Command: top
    Output:
    insert image description here

Instructions:
1. System uptime and average load:

The top of the top command displays similar output to the uptime command.
These fields display:
the current time, the amount of time the system has been running, the number of currently logged-in users, and the corresponding load averages over the last 5, 10, and 15 minutes.
The display of uptime can be toggled using the 'l' command.
22:46:38 — Current system time
0 days, 3:59 — The system has been running for 3 hours and 59 minutes (did not restart during this period)
3 users — There are currently 2 users logged into the system
load average: 0.01, 0.02, 0.00 — The three numbers behind the load average are the load conditions of 5 minutes, 10 minutes, and 15 minutes respectively.
The load average data is the value calculated by checking the number of active processes every 5 seconds and then according to a specific algorithm. If this number is divided by the number of logical CPUs, a result higher than 5 indicates that the system is overloaded.

2. Tasks:
insert image description here

Tasks — tasks (processes), the system now has a total of 146 processes, of which 1 is running, 145 are in sleep (sleep), 0 are in the stopped state, and 0 are in the zombie state (zombie).
The second line shows a summary of the task or process. Processes can be in different states. This shows the number of all processes. In addition to this, there are the number of running, sleeping, stopped, zombie processes (zombie is a state of a process). These process summary information can be toggled with 't'

3. CPU status:
insert image description here

This shows the percentage of cpu time in different modes. These different cpu times represent:
us, user: CPU time of running (unadjusted priority) user process
sy, system: CPU time of running kernel process
ni, niced: running CPU time wa of user processes with adjusted priorities
, IO wait: CPU time for waiting for IO completion
hi: CPU time for processing hardware interrupts
si: CPU time for processing software interrupts
st: CPU stolen by this virtual machine by the hypervisor Time (Annotation: If the vm is currently under a hypervisor, in fact the hypervisor also consumes part of the CPU processing time).
The display can be toggled using the 't' command.
0.3% us — The percentage of CPU occupied by user space.
0.7% sy — The percentage of CPU occupied by kernel space.
0.0% ni — The percentage of CPU occupied by the process whose priority has been changed
99.0% id — The percentage of idle CPU
0.0% wa — The percentage of CPU occupied by IO waiting
0.0% hi — The percentage of CPU occupied by hardware IRQ
si — Soft Interrupts (Software Interrupts) occupy the percentage of the CPU.
Here, the CPU usage ratio is different from the concept of windows. If you don’t understand user space and kernel space, you need to recharge.

4. Memory usage:
insert image description here

The next two lines show the memory usage, kind of like the 'free' command. The first line is physical memory usage and the second line is virtual memory usage (swap space).
The physical memory is displayed as follows: total available memory, used memory, free memory, buffer memory. Similarly: The Swap section shows: All, Used, Free and Buffered swap space.
The memory display can be toggled with the 'm' command.
1004348k total — total physical memory (1004M)
938408k used — total amount of memory in use (938M)
65940k free — total amount of free memory (65M)
44344k buffers — amount of cached memory (44M)
swap swap partition
2031612k total — swap area Total (2031M)
4k used - the total amount of swap area used (4k)
2031608k free - the total amount of free swap area (2031M)
538676k cached - the total amount of buffered swap area (538M)

5. Status monitoring of each process (task):
insert image description here

PID: Process ID, the unique identifier of the process
USER: The actual username of the process owner.
PR: The scheduling priority of the process. Some values ​​for this field are 'rt'. This means that these processes run in real time.
NI: The nice value (priority) of the process. Smaller values ​​mean higher priority. Negative values ​​indicate high priority, positive values ​​indicate low priority
VIRT: Virtual memory used by the process. The total amount of virtual memory used by the process, in kb. VIRT=SWAP+RES
RES: Resident memory size. Resident memory is the amount of non-swapped physical memory used by tasks. The physical memory size used by the process and not swapped out, in kb. RES=CODE+DATA
SHR: SHR is the shared memory used by the process. Shared memory size, unit kb
S: This is the state of the process. It has the following different values:
D - Uninterruptible sleep state.
R - Running
S - Sleeping
T - Tracked or Stopped
Z - Zombie
%CPU: The percentage of CPU time used by the task since the last update.
%MEM: The percentage of available physical memory used by the process.
TIME+: The total CPU time used since the task was started, accurate to one hundredth of a second.
COMMAND: The command used to run the process. The process name (command name/command line)
also has a lot of output that is not displayed by default, they can show the process's page fault, effective group and group ID and other more information.

6. Other usage skills:
6.1. Multi-U multi-core CPU monitoring
In the top basic view, press the keyboard number "1" to monitor the status of each logical CPU:
insert image description here

6.2. Highlight the currently running process
Press the keyboard "b" (to turn on/off the highlighting effect), the view of top changes as follows:
insert image description here

We found that the "top" process with process id 7600 is highlighted. The top process is the only running process displayed in the second line of the view. You can close or open the running process by pressing the "y" key the highlighting effect.

6.3. Sorting of process fields
By default, when entering top, each process is sorted according to the CPU usage. In the figure below, the java process with process ID 7517 ranks first (cpu occupies 0.7%), and the java process with process ID 3073 Ranked second (0.3% cpu usage).
insert image description here

Press the keyboard "x" (to turn on/off the highlighting effect of the sorting column), the view of top changes as follows:
insert image description here

As you can see, the default sorting column of top is "%CPU"

6.4. By "shift + >" or "shift + <", you can change the sorting column to the right or left. The
picture below is the effect of pressing "shift + >", and the view is now sorted according to %MEM.
insert image description here

02
Example 2 shows the complete command:
command: top -c
output:
insert image description here

03
Instance 3 displays the specified process information:
command:
top -p 7517
output:
insert image description here

6. Top interactive commands:
some interactive commands that can be used during the execution of the top command. These commands are all single-letter, and some of them may be blocked if the s option is used in the command line.
h Displays a help screen, giving some short command summaries
k Terminates a process.
i Ignore idle and zombie processes. This is a switch command.
q Exit the program
r Re-arrange the priority level of a process
S Switch to cumulative mode
s Change the delay time between two refreshes (unit is s), if there is a decimal, convert it to ms. Enter a value of 0 and the system will refresh continuously, the default value is 5 s
f or F Add or delete items from the current display
o or O Change the order of displayed items
l Switch the display of average load and startup time information
m Switch the display of memory information
t Switch the display Process and CPU status information
c Switch to display command name and complete command line
M Sort by resident memory size
P Sort by CPU usage percentage
T Sort by time/cumulative time
W Write current settings into ~/.toprc file

How to call system monitoring in Go language? We need to use the gopsutil package, and python has a similar package. Here, only Go is used to realize various video surveillance.

// QueryHost 获取本机信息
func queryHost() (out *response.JsonSimpleRes, err error) {
   out = &response.JsonSimpleRes{}

   info, _ := host.Info()
   fmt.Println(info)

   // host.BootTime()返回主机开机时间的时间戳:
   timestamp, _ := host.BootTime()
   t := time.Unix(int64(timestamp), 0)
   fmt.Println(t.Local().Format("2006-01-02 15:04:05"))

   //内核版本和平台信息
   version, _ := host.KernelVersion()
   fmt.Println(version)

   platform, family, version, _ := host.PlatformInformation()
   fmt.Println("platform:", platform)
   fmt.Println("family:", family, "version:", version)

   // host.Users()返回终端连接上来的用户信息,每个用户一个UserStat结构:
   users, _ := host.Users()
   for _, user := range users {
      data, _ := json.MarshalIndent(user, "", " ")
      fmt.Println(string(data))
   }

   return out, err

Example output:

  {"hostname":"WIN-SP09TQCP1U8","uptime":25308,"bootTime":1558574107,"procs":175,"os":"windows","platform":"Microsoft Windows 10 Pro","platformFamily":"Standalone Workstation","platformVersion":"10.0.17134 Build 17134","kernelVersion":"","virtualizationSystem":"","virtualizationRole":"","hostid":。。。}
}


// QueryCPU 采集CPU相关信息
func queryCPU() (out *response.JsonSimpleRes, err error) {
   out = &response.JsonSimpleRes{}

   c, _ := cpu.Info()
   fmt.Println("cpu信息:", c)
   输出内容:[{"cpu":0,cores":4,"modelName":"Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz","mhz":2501,。。。]
   /*用户CPU时间/系统CPU时间/空闲时间。。。等等
     用户CPU时间:就是用户的进程获得了CPU资源以后,在用户态执行的时间。
     系统CPU时间:用户进程获得了CPU资源以后,在内核态的执行时间。
   */
   c1, _ := cpu.Times(false)
   fmt.Println("cpu1:", c1)
   输出内容:[{"cpu":"cpu-total","user":1272.0,"system":1572.7,"idle":23092.3,"nice":0.0,"iowait":0.0,"irq":0.0,。。。}]
   // 用户CPU时间:就是用户的进程获得了CPU资源以后,在用户态执行的时间。
   // 系统CPU时间:用户进程获得了CPU资源以后,在内核态的执行时间。
   // CPU使用率,每秒刷新一次
   //for {
   c2, _ := cpu.Percent(time.Duration(time.Second), false)
   fmt.Println(c2)
   //}

   n, _ := cpu.Counts(true) //cpu逻辑数量
   fmt.Println(n)           //4
   n, _ = cpu.Counts(false) //cpu物理核心
   fmt.Println(n)           //如果是2说明是双核超线程, 如果是4则是4核非超线程

   return out, err
}


// QueryMem 采集内存信息
func queryMem() (out *response.JsonSimpleRes, err error) {
   out = &response.JsonSimpleRes{}

   //获取物理内存和交换区内存信息
   m1, _ := mem.VirtualMemory()
   fmt.Println("m1:", m1)
   m2, _ := mem.SwapMemory()
   fmt.Println("m2:", m2)
   return out, err

output:

{"total":8129818624,"available":4193423360,"used":3936395264,"usedPercent":48,"free":0,"active":0,"inactive":0,...}

output:

{"total":8666689536,"used":4716843008,"free":3949846528,"usedPercent":0.5442496801583825,"sin":0,"sout":0,...}
   //总内存大小是8129818624 = 8 GB,已用3936395264 = 3.9 GB,使用了48%。而交换区大小是8666689536 = 8 GB。
}

// QueryDisk 采集磁盘信息
func queryDisk() (out *response.JsonSimpleRes, err error) {
   out = &response.JsonSimpleRes{}

   //可以通过psutil获取磁盘分区、磁盘使用率和磁盘IO信息
   d1, _ := disk.Partitions(true) //所有分区
   fmt.Println("d1:", d1)
   d2, _ := disk.Usage("E:") //指定某路径的硬盘使用情况
   fmt.Println("d2:", d2)
   d3, _ := disk.IOCounters() //所有硬盘的io信息
   fmt.Println("d3:", d3)
   return out, err

output:

[{"device":"C:","mountpoint":"C:","fstype":"NTFS","opts":"rw.compress"} {"device":"D:","mountpoint":"D:","fstype":"NTFS","opts":"rw.compress"} {"device":"E:","mountpoint":"E:","fstype":"NTFS","opts":"rw.compress"} ]
   // {"path":"E:","fstype":"","total":107380965376,"free":46790828032,"used":60590137344,"usedPercent":56.425398236866755,"inodesTotal":0,"inodesUsed":0,"inodesFree":0,"inodesUsedPercent":0}
   // map[C::{"readCount":0,"mergedReadCount":0,"writeCount":0,"mergedWriteCount":0,"readBytes":0,"writeBytes":4096,"readTime":0,"writeTime":0,"iopsInProgress":0,"ioTime":0,"weightedIO":0,"name":"C:","serialNumber":"","label":""} 。。。]
}

// QueryNet 采集网络信息
func queryNet() (out *response.JsonSimpleRes, err error) {
   out = &response.JsonSimpleRes{}

   //获取当前网络连接信息
   n1, _ := net.Connections("all") //可填入tcp、udp、tcp4、udp4等等
   fmt.Println("n1:", n1)

output:

[{"fd":0,"family":2,"type":1,"localaddr":{"ip":"0.0.0.0","port":135},"remoteaddr":{"ip":"0.0.0.0","port":0},"status":"LISTEN","uids":null,"pid":668} {"fd":0,"family":2,"type":1,"localaddr":{"ip":"0.0.0.0","port":445},"remoteaddr":{"ip":"0.0.0.0","port":0},"status":"LISTEN","uids":null,"pid":4} {"fd":0,"family":2,"type":1,"localaddr":{"ip":"0.0.0.0","port":1801},"remoteaddr":{"ip":"0.0.0.0","port":0},"status":"LISTEN","uids":null,"pid":3860}
   // 等等。。。]
   //获取网络读写字节/包的个数
   n2, _ := net.IOCounters(false)
   fmt.Println("n2:", n2)
   return out, err
   //output:[{"name":"all","bytesSent":6516450,"bytesRecv":36991210,"packetsSent":21767,"packetsRecv":33990,"errin":0,"errout":0,"dropin":0,"dropout":0,"fifoin":0,"fifoout":0}]

}

// QueryProcess 采集进程相关信息
func queryProcess() (out *response.JsonSimpleRes, err error) {
   out = &response.JsonSimpleRes{}

   //获取到所有进程的详细信息
   p1, _ := process.Pids() //获取当前所有进程的pid
   fmt.Println("p1:", p1)

output:

[0 4 96 464 636 740 748 816 852 880 976 348 564 668 912 1048 1120 1184 1268 1288。。。]
   ifExists, _ := process.PidExists(10086) // 判断进程是否存在
   fmt.Println("ifExists:", ifExists)

   return out, err
}

Author: Fu Cheng Nebula

Guess you like

Origin blog.csdn.net/ekcchina/article/details/131519124