Tips to solve the poor performance of vSphere
https://blog.csdn.net/VirtualMan_/article/details/105171430
Preface
It is true that VMs will definitely experience interruptions, performance issues or stop responding. As a virtualization engineer, I have encountered these problems at least once. Since a virtualized environment is a very complex system, there are too many different reasons that affect VM performance. Trying to figure out the problem can take a lot of time.
This article will work together to try to determine what causes the performance problems of the VMware infrastructure and find ways to avoid the problems.
Click to read the original text
What is needed for proper troubleshooting?
The first is recording. Recording is the secret to solving vSphere environmental problems. Of course, you will 100% trust your memory and remember all the information you need to know, such as login credentials or any other necessary information. When the server suddenly fails or the ESXi host is overloaded, the last thing you want to happen is to try to remember the password to enter the host or vSphere vCenter.
In addition, any existing documentation (such as the vSphere cluster solution) can be of great help. If you don't know how the entire system is configured, it will greatly reduce the speed. No one really likes to keep records, but believes that when needed, they will thank themselves for having records. Now, let's take a look at what this information should contain:
ESXi host
Hostname/IP address
ESXi host version and patch level
Root password (keep it in a safe place)
Record storage and interface IP addresses
Host hardware description
Storage configuration (iSCSI, etc.)
Network adapter (manufacturer, driver version, etc.)
Storage switch
IP address used
Firmware version
Credentials (kept in a safe place)
Vlan settings
Storage array
IP address of the SAN management port
Firmware level
LUN configuration, RAID level, number of drives, size, drive firmware
Login name and password of the SAN array management interface
Vendor specific SAN management tools (tools)
As with all the information you can gather, the more documentation the better. Sadly, many administrators ignore this rule. In addition, if the document is out of date, it will not be of much use, so it needs to be updated anytime and anywhere.
What to do first?
1. Take a closer look at VMware's best performance practices
First, there is a troubleshooting plan. Classify possible problems according to their relevance (VMware Tools, CPU, etc.) and their scope (from 100% impact on performance to minimal impact). If you use it, you can greatly improve the infrastructure.
2. VMware Tools?
Ensure that VMware Tools is installed, upgraded, and running on each VM. Basically, the VMware Tools software package is a set of virtual device drivers that will affect the performance of the virtual machine (of course, it is usually better).
Verify that VMware Tools is installed:
-
Select a host in the vSphere Web Client
Move to the virtual machine tab
Add VMware Tools Status
Check the status. If it says "Yes", start looking for another way to improve performance.
Not yet running/expired-Install VMware Tools.
If VMware Tools has not been started, the guest operating system needs to be repaired, which may be the problem. Either the Linux kernel is updated, or VMware Tools in Windows is disabled for some reason.
If the current VMware Tools version is out of date, you must use the vSphere Web Client menu to upgrade. Usually, this happens after installing the latest update on the ESX/ESXi host. After the operation, don't forget to keep VMwareTools up to date. In general, using vSphere Web Client can easily check VMwareTools as suggested by the following scenarios:
The vmtools status of the virtual machine is displayed.
You can add vmtools by right-clicking on the title and selecting accordingly.
You can also use PowerCLI to check the existence and current status of the vmtools package. Most of the attributes related to vmtools are located under .guest.extensiondata.
fundamental issue
1. Insufficient virtual machine resources
There must be sufficient resources for the VM to run efficiently. However, you will be surprised to find how many VMs do not allocate enough resources based on the requirements of the guest operating system and the applications running under it. Despite the myriad benefits of virtualization, there will always be some overhead that needs to be addressed. If the memory runs out, what will the VM do? Naturally, the computer will start swapping to disk more frequently. If the basic storage is full, performance will suffer a huge blow. Therefore, whenever there is an opportunity, use reserved space, resource pools, DRS, and any method that can ensure that the correct amount of resources are allocated to the VM to maximize operational efficiency.
2. Performance Monitoring
Performance monitoring is a function embedded in the vSphere Client. This is an essential tool that will help check performance-related issues. This is so good because it enables you to use alerts wherever possible, so you can know performance bottlenecks in advance.
But keep in mind that when working on the local ESXi host, you can only access the "Performance" tab. If you need more detailed information, please use VMware® vSphere vCenter.
SUPER IMPORTANT. Performance and Advanced Performance are more than just effective and informative diagnostic tools. If used properly, you can easily find the weakness of the system.
Let us take the resource pool CPU usage as an example. check the detail information:
-
Select the resource pool, move to Performance, switch to Advanced and select CPU
Usage in MHz
Compare the value of the resource pool limit with the current usage value. If it is close to the limit, there may be insufficient resources, and all that needs to be done is to re-evaluate the CPU ready value of each VM in the pool.
CPU readiness verification:
-
Select a virtual machine, move to Performance, select Advanced, switch to CPU (if you want to troubleshoot the performance of a specific VM, start with that VM)
Assess the readiness of all VM objects. The "object" is the independent vCPU of the VM. Need to change the properties of "Chart Options..."
Does the minimum or average readiness value of any vCPU exceed 2000 milliseconds? If so, everything becomes clear by looking at the time. Due to the limit set by the resource pool, only processor resources are lacking.
Now, you only need to do the same for the remaining virtual machines in the pool.
Host CPU usage verification
-
Select the host, move to Performance, select Advanced, select CPU
Usage in MHz
Is it more than 75%, or is it up to 90%? If so, then the host processor resources are lacking. As described below, verify that the VM on this host has ready time for the CPU. If the average CPU usage does not exceed 75%, then the next thing to watch is!
CPU readiness verification
-
If you want to troubleshoot the performance of a specific virtual machine, start with that virtual machine. Otherwise, select a host, move to "Virtual Machine",
arrange the list in the order of "Host CPU- MHz" (from column to column), and then look at one or two VMs from the beginning of the list.To evaluate CPU readiness, select a VM, move to Performance, switch to Advanced, switch to CPU
Assess the readiness of all VM objects. The "object" is the independent vCPU of the VM. Need to change the properties of "Chart Options..."
Does the minimum or average readiness value of any vCPU exceed 2000 milliseconds? If so, everything becomes clear by looking at the time. Due to the limit set by the resource pool, only processor resources are lacking.
Potential problem parameters that need to be verified:
-
Client CPU usage
Active VM memory swap
VM swap waiting
VM memory compression
Overload storage device
Received packet loss
Send packet loss
One vCPU in an SMP VM
VM CPU readiness and average load in the host
Slow or overloaded storage system
Maximum load of storage system
Peak network data transmission
Low VM processor usage
VM memory swap in the past
Resource pool memory requirements are high
High host memory requirements
High guest memory requirements
High time interruption rate
NUMA settings
High response time for virtual machine snapshots
Disk subsystem problems:
In short, storage system problems can be reduced to the following
- Storage system overload
First, what is the cause of the storage system overload? Well, the main configuration is very simple. Whether it is wrong configuration (number and type of devices/RAID level/cache/etc) or very high load. There is no universal solution, just list things you may already know:
Second, when building a storage system, we must consider not only capacity but also performance
When considering virtualization, the load type can also be switched (from consistent to random).
Use storage tools to monitor storage system disk performance, which needs to be monitored together with esxtop
There is also a tool called vSCSIStats
If certain applications provide more memory, they can reduce their disk overhead.
- Storage system is slow
Follow the above list
- Storage system latency
Shares
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.html.hostclient.doc/GUID-3AB5A86D-5AFF-4A18-A758-08A529C7A9F9.html
Limit IOPS
https://kb.vmware.com/s/article/1038241
CongestionThreshold (Storage IO Control).
https://kb.vmware.com/s/article/1019687
- Bad disk
Check the disk/network storage regularly, and replace it immediately if it fails or is out of date. However, you should know that in some cases, especially when a disk fails, starting a check (additional use of RAID memory) will leave other disks in the same fate = destroy the entire RAID.
- ESXi system
Use separate disks for the ESXi host OS, swap partitions, and VMs stored on local data storage. In addition, consider using RAID to improve read and write performance.
- Snapshot
Delete all unused or redundant snapshots. This is not an optional operation. It must now be known that the more snapshots you have, the greater the disk overhead for I/O activity.
- encryption
Use disk encryption only when necessary! Encryption will cause increased overhead, and increased overhead will cause performance degradation.
Tips:
(1) Deploy vRealize Operations Manager to conduct a more in-depth assessment of the environment
(2) Ask yourself a question: Does the VM really behave abnormally?
(3) Latest update and latest version
Updates and the latest version often resolve performance issues by fixing bugs, improving drivers and code. However, sometimes the latest version can make the situation worse! Therefore, please stay vigilant and test until you are sure. Or at least let others try it before making a decision.
(4) Antivirus software
It is not recommended to install any anti-virus software on ESXi, because ESXi itself occupies a small space and has built-in security features. If you must install AV on ESXi, you must exclude VM files (such as VMDK) from the scan plan, especially during peak usage hours.
(5) Is CPU power management enabled?
If CPU power management is enabled on the ESXi server, it may cause speed delays, which in turn causes delays in applications or workloads, resulting in performance degradation. If you think this is the source of the problem, you must check with the manufacturer's documentation on disabling CPU power management. If the effect is zero, please re-enable it and perform a health check.
(6) Battery for Bios and SCSI controller
If possible, please check the battery of the Bios system of the ESXi host and the power battery of the SCSI or other controllers. SCSI cache usually requires additional power to work, and the battery on the control board usually provides power for it. Although the manual said it was a backup power solution, I found that the power supply battery undervoltage caused the controller to work incorrectly, which can only be repaired by replacing it.
A few final suggestions:
Perform health checks on all physical structures of the storage system, including iSCSI switches, networks, and optical cables.
Check the switch log to ensure that no errors or other events have occurred in the storage system or the device itself.
Ping iSCSI from the vmkernel address to ensure that there is no problem connecting to iSCSI.
Perform a health check on the SAN itself-make sure there are no failed disks, storage controller failover events, or any other errors that may affect performance.
Check the available disk space on each LUN connected to the ESXi host.
to sum up
Troubleshooting VMware vSphere ESXi seems a bit complicated. However, with accurate documentation, a good understanding of the infrastructure, and some efficient built-in tools, any problem with the VM can be solved. Think about where and what is wrong, and then find out what caused the system to fail. You can also seek help from VMware or vendor technical support at any time.
Scan the code and follow the official account to see more technical dry goods!