Tips to solve the poor performance of vSphere

Tips to solve the poor performance of vSphere

https://blog.csdn.net/VirtualMan_/article/details/105171430

Preface

It is true that VMs will definitely experience interruptions, performance issues or stop responding. As a virtualization engineer, I have encountered these problems at least once. Since a virtualized environment is a very complex system, there are too many different reasons that affect VM performance. Trying to figure out the problem can take a lot of time.

This article will work together to try to determine what causes the performance problems of the VMware infrastructure and find ways to avoid the problems.

Click to read the original text

What is needed for proper troubleshooting?

The first is recording. Recording is the secret to solving vSphere environmental problems. Of course, you will 100% trust your memory and remember all the information you need to know, such as login credentials or any other necessary information. When the server suddenly fails or the ESXi host is overloaded, the last thing you want to happen is to try to remember the password to enter the host or vSphere vCenter.

In addition, any existing documentation (such as the vSphere cluster solution) can be of great help. If you don't know how the entire system is configured, it will greatly reduce the speed. No one really likes to keep records, but believes that when needed, they will thank themselves for having records. Now, let's take a look at what this information should contain:

ESXi host

Hostname/IP address

ESXi host version and patch level

Root password (keep it in a safe place)

Record storage and interface IP addresses

Host hardware description

Storage configuration (iSCSI, etc.)

Network adapter (manufacturer, driver version, etc.)

Storage switch

IP address used

Firmware version

Credentials (kept in a safe place)

Vlan settings

Storage array

IP address of the SAN management port

Firmware level

LUN configuration, RAID level, number of drives, size, drive firmware

Login name and password of the SAN array management interface

Vendor specific SAN management tools (tools)

As with all the information you can gather, the more documentation the better. Sadly, many administrators ignore this rule. In addition, if the document is out of date, it will not be of much use, so it needs to be updated anytime and anywhere.

What to do first?

1. Take a closer look at VMware's best performance practices

First, there is a troubleshooting plan. Classify possible problems according to their relevance (VMware Tools, CPU, etc.) and their scope (from 100% impact on performance to minimal impact). If you use it, you can greatly improve the infrastructure.

2. VMware Tools?

Ensure that VMware Tools is installed, upgraded, and running on each VM. Basically, the VMware Tools software package is a set of virtual device drivers that will affect the performance of the virtual machine (of course, it is usually better).

Verify that VMware Tools is installed:

  • Select a host in the vSphere Web Client

    Move to the virtual machine tab

    Add VMware Tools Status

    Check the status. If it says "Yes", start looking for another way to improve performance.

    Not yet running/expired-Install VMware Tools.

If VMware Tools has not been started, the guest operating system needs to be repaired, which may be the problem. Either the Linux kernel is updated, or VMware Tools in Windows is disabled for some reason.

If the current VMware Tools version is out of date, you must use the vSphere Web Client menu to upgrade. Usually, this happens after installing the latest update on the ESX/ESXi host. After the operation, don't forget to keep VMwareTools up to date. In general, using vSphere Web Client can easily check VMwareTools as suggested by the following scenarios:

Insert picture description here
The vmtools status of the virtual machine is displayed.

You can add vmtools by right-clicking on the title and selecting accordingly.
Insert picture description here

You can also use PowerCLI to check the existence and current status of the vmtools package. Most of the attributes related to vmtools are located under .guest.extensiondata.

fundamental issue

1. Insufficient virtual machine resources

There must be sufficient resources for the VM to run efficiently. However, you will be surprised to find how many VMs do not allocate enough resources based on the requirements of the guest operating system and the applications running under it. Despite the myriad benefits of virtualization, there will always be some overhead that needs to be addressed. If the memory runs out, what will the VM do? Naturally, the computer will start swapping to disk more frequently. If the basic storage is full, performance will suffer a huge blow. Therefore, whenever there is an opportunity, use reserved space, resource pools, DRS, and any method that can ensure that the correct amount of resources are allocated to the VM to maximize operational efficiency.

2. Performance Monitoring

Performance monitoring is a function embedded in the vSphere Client. This is an essential tool that will help check performance-related issues. This is so good because it enables you to use alerts wherever possible, so you can know performance bottlenecks in advance.
Insert picture description here

But keep in mind that when working on the local ESXi host, you can only access the "Performance" tab. If you need more detailed information, please use VMware® vSphere vCenter.
Insert picture description here

SUPER IMPORTANT. Performance and Advanced Performance are more than just effective and informative diagnostic tools. If used properly, you can easily find the weakness of the system.

Let us take the resource pool CPU usage as an example. check the detail information:

  • Select the resource pool, move to Performance, switch to Advanced and select CPU

    Usage in MHz

    Compare the value of the resource pool limit with the current usage value. If it is close to the limit, there may be insufficient resources, and all that needs to be done is to re-evaluate the CPU ready value of each VM in the pool.

CPU readiness verification:

  • Select a virtual machine, move to Performance, select Advanced, switch to CPU (if you want to troubleshoot the performance of a specific VM, start with that VM)

    Assess the readiness of all VM objects. The "object" is the independent vCPU of the VM. Need to change the properties of "Chart Options..."

    Does the minimum or average readiness value of any vCPU exceed 2000 milliseconds? If so, everything becomes clear by looking at the time. Due to the limit set by the resource pool, only processor resources are lacking.

    Now, you only need to do the same for the remaining virtual machines in the pool.

Host CPU usage verification

  • Select the host, move to Performance, select Advanced, select CPU

    Usage in MHz

    Is it more than 75%, or is it up to 90%? If so, then the host processor resources are lacking. As described below, verify that the VM on this host has ready time for the CPU. If the average CPU usage does not exceed 75%, then the next thing to watch is!

CPU readiness verification

  • If you want to troubleshoot the performance of a specific virtual machine, start with that virtual machine. Otherwise, select a host, move to "Virtual Machine",
    arrange the list in the order of "Host CPU- MHz" (from column to column), and then look at one or two VMs from the beginning of the list.

    To evaluate CPU readiness, select a VM, move to Performance, switch to Advanced, switch to CPU

    Assess the readiness of all VM objects. The "object" is the independent vCPU of the VM. Need to change the properties of "Chart Options..."

    Does the minimum or average readiness value of any vCPU exceed 2000 milliseconds? If so, everything becomes clear by looking at the time. Due to the limit set by the resource pool, only processor resources are lacking.

Potential problem parameters that need to be verified:

  • Client CPU usage

    Active VM memory swap

    VM swap waiting

    VM memory compression

    Overload storage device

    Received packet loss

    Send packet loss

    One vCPU in an SMP VM

    VM CPU readiness and average load in the host

    Slow or overloaded storage system

    Maximum load of storage system

    Peak network data transmission

    Low VM processor usage

    VM memory swap in the past

    Resource pool memory requirements are high

    High host memory requirements

    High guest memory requirements

    High time interruption rate

    NUMA settings

    High response time for virtual machine snapshots

Disk subsystem problems:

In short, storage system problems can be reduced to the following

  1. Storage system overload

First, what is the cause of the storage system overload? Well, the main configuration is very simple. Whether it is wrong configuration (number and type of devices/RAID level/cache/etc) or very high load. There is no universal solution, just list things you may already know:

Second, when building a storage system, we must consider not only capacity but also performance

When considering virtualization, the load type can also be switched (from consistent to random).

Use storage tools to monitor storage system disk performance, which needs to be monitored together with esxtop

There is also a tool called vSCSIStats

If certain applications provide more memory, they can reduce their disk overhead.

  1. Storage system is slow

Follow the above list

  1. Storage system latency

Shares
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.html.hostclient.doc/GUID-3AB5A86D-5AFF-4A18-A758-08A529C7A9F9.html
Limit IOPS
https://kb.vmware.com/s/article/1038241
CongestionThreshold (Storage IO Control).
https://kb.vmware.com/s/article/1019687

  1. Bad disk

Check the disk/network storage regularly, and replace it immediately if it fails or is out of date. However, you should know that in some cases, especially when a disk fails, starting a check (additional use of RAID memory) will leave other disks in the same fate = destroy the entire RAID.

  1. ESXi system

Use separate disks for the ESXi host OS, swap partitions, and VMs stored on local data storage. In addition, consider using RAID to improve read and write performance.

  1. Snapshot

Delete all unused or redundant snapshots. This is not an optional operation. It must now be known that the more snapshots you have, the greater the disk overhead for I/O activity.

  1. encryption

Use disk encryption only when necessary! Encryption will cause increased overhead, and increased overhead will cause performance degradation.

Tips:

(1) Deploy vRealize Operations Manager to conduct a more in-depth assessment of the environment

(2) Ask yourself a question: Does the VM really behave abnormally?

(3) Latest update and latest version

Updates and the latest version often resolve performance issues by fixing bugs, improving drivers and code. However, sometimes the latest version can make the situation worse! Therefore, please stay vigilant and test until you are sure. Or at least let others try it before making a decision.

(4) Antivirus software

It is not recommended to install any anti-virus software on ESXi, because ESXi itself occupies a small space and has built-in security features. If you must install AV on ESXi, you must exclude VM files (such as VMDK) from the scan plan, especially during peak usage hours.

(5) Is CPU power management enabled?

If CPU power management is enabled on the ESXi server, it may cause speed delays, which in turn causes delays in applications or workloads, resulting in performance degradation. If you think this is the source of the problem, you must check with the manufacturer's documentation on disabling CPU power management. If the effect is zero, please re-enable it and perform a health check.

(6) Battery for Bios and SCSI controller

If possible, please check the battery of the Bios system of the ESXi host and the power battery of the SCSI or other controllers. SCSI cache usually requires additional power to work, and the battery on the control board usually provides power for it. Although the manual said it was a backup power solution, I found that the power supply battery undervoltage caused the controller to work incorrectly, which can only be repaired by replacing it.

A few final suggestions:

Perform health checks on all physical structures of the storage system, including iSCSI switches, networks, and optical cables.

Check the switch log to ensure that no errors or other events have occurred in the storage system or the device itself.

Ping iSCSI from the vmkernel address to ensure that there is no problem connecting to iSCSI.

Perform a health check on the SAN itself-make sure there are no failed disks, storage controller failover events, or any other errors that may affect performance.

Check the available disk space on each LUN connected to the ESXi host.

to sum up

Troubleshooting VMware vSphere ESXi seems a bit complicated. However, with accurate documentation, a good understanding of the infrastructure, and some efficient built-in tools, any problem with the VM can be solved. Think about where and what is wrong, and then find out what caused the system to fail. You can also seek help from VMware or vendor technical support at any time.

Scan the code and follow the official account to see more technical dry goods!

Guess you like

Origin blog.csdn.net/z136370204/article/details/113663160