Kubernetes Test Series - Performance Testing

Preface

For software, the performance test generally comprises two parts, producing an appropriate load, the second is reasonable observed under load.

The definition of the observed object

Any system performance are all boundaries, the need to ensure the performance cost, manpower, time, also can not be endless, or the full range of performance requirements.
The purpose is to develop software systems in order to better serve customers, users of the system have quality requirements. Affects the quality requirements of the user performance is the main core indicators, others are secondary indicators (such as programmers racking our brains indicators). Therefore, prior to performance testing, to define which user metrics will affect the quality requirement is to work, even if not fully defined indicators. Blind performance test results obtained are usually meaningless.

Cloud vendors typically use the SLA (Service Level Agreement) to summarize user quality requirements of the system. Unfortunately, due to the complexity of Kubernetes, as of 2019.5.18, the major cloud vendors do not provide SLA may eventually released.
Kubernetes community offers some SLI (service level indicators) and guidance system performance (Service Level Objectives) SLO testing, analysis, see the jump address . These indicators indirectly describes the Kubernetes system service quality, is an indicator of performance testing focus.
In addition, Master machine's CPU, Memory and other indicators, and other indicators ETCD of ioutil also cause significant effect on Kubernetes services, and should therefore be included in the same range observed performance testing.

Manufacturing cluster load

  • Cheap load
    community data show Kubernetes can support 5,000 Node, 15 Wan Pod cluster running well. This means verifying this statement should be at least 5,000 manufacturing Node, 15 Wan Pod load. However, to get the real machine of 5000 the right to use expensive, long-term occupation of so many machines for Kubernetes development and testing is unscientific. Therefore, the development of the community kubemarkand Virtual Kubeletother tools used to simulate the real Node, use a small amount of the machine can also produce a large enough load. During the test of millet, 20 real machine can simulate 3,000 Node, load 100 000 Pod of.
  • A variety of load
    in the actual operating environment, the real load is varied, different load may have different effects on the performance of the system to simulate a variety of load conditions during testing is necessary. In order to facilitate a variety of simulated load, performance testing program may be configured to provide a load resistance should be as strong as possible. In the early Kubernetes code, the parameters of the performance test program written in dead code, to modify the source code to recompile after Kubernetes, copied to the test environment, test parameters can be changed only once. Testing process is very painful.
    Therefore, the community developed Perf-the Test / clusterloader2 , highly configurable, and is equipped with corresponding performance indicators observing the code, highly recommend.

Tool instructions

Kubernetes performance testing of pre-requirements:

  • Well-prepared to run a cluster of Kubernetes

This article describes the following performance testing tools kubemark and Perf-the Test / clusterloader2 .

kubemark

kubemark is a castrated version of kubelet, in addition to not call the CRI interfaces (ie, do not call Docker, directly back), and other acts kubelet basically the same.

Compile

  • Compiled binary
    download kubernetes source, executemake WHAT='cmd/kubemark'
  • Mirroring compiled
    compiled docker mirror script test / kubemark / start-kubemark.sh because to involve some code transformation, which I will not explain.

Steps for usage

  • Mark true Node
    1. Set Taint
    assume Node name mytestnode, execute the following command to set the Taint.
    kubectl taint nodes mytestnode role=real:NoSchedule
    Taint provided purpose is to avoid the stress test to schedule real Pod Node.
    2. Set the Label
    assume Node name mytestnode, execute the following command to set Label.
    kubectl label nodes mytestnode role=real
  • Configuration kubeconfig
apiVersion: v1
kind: Config
users:
- name: kubelet
  user: {}
clusters:
- name: kubemark
  cluster:
    server: http://10.114.25.172:8083 #替换成你自己的APIServer地址
contexts:
- context:
    cluster: kubemark
    user: kubelet
  name: kubemark-context
current-context: kubemark-context

If the above kubeconfig saved /home/mi/.kube/config, execute the following command to create a secret

kubectl create secret generic kubeconfig --from-file=kubelet.kubeconfig=/home/mi/.kube/config
  • Creating kubemark Pod
    execute the following script to run kubemark:
curl -L https://gist.githubusercontent.com/Betula-L/fef068ef7e914aa4a52113ac81fc6517/raw/77abf3f9b234274e33435597dec079ef46300324/kubemark.yaml | kubectl apply -f -

Precautions

  • kubemark try and master versions of the same version
  • The above description applies not turned authentication and authorization master

clusterloader2

Operational requirements

  • master node open ssh service
  • Test start time for all Node in the Ready state (including hollow node)

Compile

Download pert-tests project , run ${perf-tests}/clusterloader2/run-e2e.shor compile a script follows a binary file.

CLUSTERLOADER_ROOT=${perf-tests}
cd ${CLUSTERLOADER_ROOT}/ && go build -o clusterloader './cmd/'

Steps for usage

  • Set Environment Variables
# kube config for kubernetes api
KUBE_CONFIG=${HOME}/.kube/config

# Provider setting
# Supported provider for xiaomi: local, kubemark, lvm-local, lvm-kubemark
PROVIDER='kubemark'

# SSH config for metrics' collection
KUBE_SSH_KEY_PATH=$HOME/.ssh/id_rsa
MASTER_SSH_IP=10.142.43.51
MASTER_SSH_USER_NAME=root

# Clusterloader2 testing strategy config paths
# It supports setting up multiple test strategy. Each testing strategy is individual and serial.
TEST_CONFIG='configs/examples/density/config.yaml'

# Clusterloader2 testing override config paths
# It supports setting up multiple override config files. All of override config files will be applied to each testing strategy.
# OVERRIDE_CONFIG='testing/density/override/200-nodes.yaml'

# Log config
REPORT_DIR='./reports'
LOG_FILE='logs/tmp.log'

Filled with the contents of the script, and use the sourcecommand set up the Linux environment variables

  • Run clusterloader2
$CLUSTERLOADER_BIN --kubeconfig=$KUBE_CONFIG \
    --provider=$PROVIDER \
    --masterip=$MASTER_SSH_IP --mastername=$MASTER_SSH_USER_NAME \
    --testconfig=$TEST_CONFIG \
    --report-dir=$REPORT_DIR \
    --alsologtostderr 2>&1 | tee $LOG_FILE
#   --testoverrides="${OVERRIDE_CONFIG:-}" \

Clusterloader2 above operation command, wherein CLUSTERLOADER_BINa compiled binaries clusterloader2 position, the other parameters are a good set environment variables.

test introduction

pert-tests only performance testing framework, specific performance testing strategy requires the user to define their own through a configuration file.
To kubernetes density test , for example, explain clusterloader2 testing process and its results.
density testing strategy

  1. To start several observation program
    in the testing policy configuration file , all of Measurement are used for observation program data collection. perf-tests provide a dozen the Measurement , Density wherein five tests used, including APIResponsiveness, SaturationPodStartupLatency, WaitForRunningSaturationRCs, SchedulingThroughput, PodStartupLatency.
  2. Scheduling system throughput test
    kubernetes that each Node 30 Pod is running normal machine load. Disposable release #node*30a scheduled throughput for testing Pod cluster can "full scale system failure occurs, the recovery time from zero to the desired normal load", defined as a certain schedule:
              调度吞吐量=一段较长时间内最大可以发布的Pod数量/总发布时间
    Notably, clusterloader2 not simply using a rc create #node*30a Pod, since the use of the system kubernetes some basic assumptions, including the "number of no more than namespace each Pod 3000", "Pod number per node can not exceed 110". In order to meet these basic assumptions, clusterloader2 when creating rc, we do a lot of special treatment.
  3. Pod start e2e time-consuming tests
    to illustrate Pod start e2e time-consuming, without deleting "scheduling throughput test" load basis, the latency test.
    latency test strategy is to create a pod every 0.2 seconds, according to the timestamp and creates Event generated during pod calculated Pod boot process, each stage of the total e2e consuming and time-consuming. If the 调度吞吐量<5pods/stime interval is created the Pod should be set greater than 0.2s.
    Currently latency test accuracy problems, because kubernetes event storage time is RFC3339, resulting in time-consuming precision stage only to the second part, not on the performance of analytical value.

    density testing strategy

Test Results

  • stdout / stderr
    overall performance testing and some of the test results will be printed to stdout, if the run clusterloader2 by methods described herein, to a print result will be saved ${LOG_FILE}.
调度吞吐量结果:
Jan  9 13:55:47.876: INFO: E2E startup time for 10500 pods: 28m40.333662404s
Jan  9 13:55:47.876: INFO: Throughput (pods/s) during cluster saturation phase: 6.1034675
^[[1mSTEP^[[0m: Printing Pod to Node allocation data
Jan  9 13:55:47.884: INFO: Density Pods: 10500 out of 10500 created, 10500 running, 0 pending, 0 waiting, 0 inactive, 0 terminating, 0 unknown, 0 runningButNotReady
Pod启动e2e耗时结果:
Jan  9 14:09:31.598: INFO: perc50: 1614, perc90: 2241, perc99: 2541
Jan  9 14:09:31.598: INFO: Approx throughput: 7486.052881923156 pods/min
  • report folders
    detailed performance test results collected in the report clusterloader2 folder, if you run according to the proposed method, the result is stored in {$REPORT_DIR}the. Because very much involved in the performance test results, this article is not to list here. Currently Kubernetes performance analysis of the most important performance indicators are as follows:
    • APIServer Restful API response time - APIResponsiveness
    • Pod start total time - PodStartupLatency
    • Scheduler scheduling performance - SchedulingThroughput, SchedulingMetrics
    • ETCD index - EtcdMetrics
Original articles published 0 · won praise 0 · Views 541

Guess you like

Origin blog.csdn.net/qingdao666666/article/details/104625457