Bug description
K8s Worker node load suddenly increased to 90+, resulting in node NotReady.
The node is 8c / 10G configuration, the load is obviously not normal.
Investigation
First with htop
found cpu / memory usage is not high, suspected disk IO issue.
View prometheus monitoring:
It found to be caused by a 100% surge in first memory, and disk IO rise.
All Pod on the investigation node discovery added a new vessel Prometheus to finalize another monitoring tool automatically deployed.