[Tencent Cloud Finops Crane Training Camp] Cloud resource analysis and cost optimization platform based on Kubernetes


Foreword:

In order to promote cloud-native users to achieve real extreme cost reduction on the basis of ensuring business stability, Tencent Cloud launched the first cost-optimized open source project Crane ( ) based on cloud-native technology in China Cloud Resource Analytics and Economics. Crane follows FinOps standards and aims to provide cloud native users with a one-stop solution for cloud cost optimization.

1. Basic introduction

Crane is a FinOps-based cloud resource analysis and cost optimization platform. Its vision is to achieve extreme cost reduction while protecting the running quality of customer applications.

1. Main functions

insert image description here

1) Cost visualization and optimization evaluation

  • Provide a set of exporter computing cluster cloud resource billing and billing data and store it in your monitoring system , such as Prometheus.
  • Multi-dimensional cost insight and optimized evaluation. Cloud ProviderMulti-cloud billing is supported through .

2) Recommended framework

  • Provides an extensible recommendation framework to support the analysis of various cloud resources, and has a variety of built-in recommenders: resource recommendation, replica recommendation, HPA recommendation, and idle resource recommendation.

3) Forecast-based horizontal elasticizer

  • EffectiveHorizontalPodAutoscalerSupports forecast-driven resilience;
  • The community-based HPA implements underlying elastic control and supports richer elastic trigger strategies (prediction, observation, cycle), which makes elasticity more efficient and guarantees the quality of service.

4) Load-aware scheduler

  • The dynamic scheduler builds a simple but efficient model based on actual node utilization and filters out those highly loaded nodes to balance the cluster.

5) Topology-aware scheduler

  • Crane SchedulerWorking in Crane Agentconjunction with , it supports more refined resource topology-aware scheduling and multiple core binding strategies, enabling resources to be used more reasonably and efficiently.

6) Mixed department based on QOS

  • QOS-related capabilities ensure the stability of Pods running on Kubernetes.
  • It has interference detection and active avoidance capabilities under multi-dimensional index conditions, and supports precise operation and custom index access;
  • It has the ability to oversell elastic resources enhanced by predictive algorithms, reuse and limit idle resources in the cluster;
  • It has enhanced bypass cpusetmanagement capabilities, and improves resource utilization efficiency while binding cores.

2. Overall structure

The overall structure of Crane is as follows:
insert image description here


Crane is the core component of Crane, which manages the life cycle of CRDs and API. Craned is deployed Deploymentvia and consists of two containers:

  • Craned: Run Operators to manage CRDs, provide WebAPI to Dashboard, and Predictors provide TimeSeries API;
  • Dashboard: A front-end project developed based on TDesign's Starter scaffolding, which provides easy-to-use product functions.

Fadvisor provides a set of billing and billing data of Exporter computing cluster cloud resources and stores them in your monitoring system, such as Prometheus.

  • Fadvisor Cloud Providersupports .

The Metric Adapter implements one Custom Metric Apiserver.

  • Metric Adapter reads CRDs information and provides HPA Metric- Custom/External Metric APIbased data.

  • The Crane Agent is DaemonSetdeployed on the nodes of the cluster through .

2. Implement cloud resource analysis and cost optimization platform based on Kubernetes

1. Preparations


1) Install Helm

[root@k8s-master01 ~]# wget https://get.helm.sh/helm-v3.7.2-linux-amd64.tar.gz
[root@k8s-master01 ~]# tar zxf helm-v3.7.2-linux-amd64.tar.gz
[root@k8s-master01 ~]# mv linux-amd64/helm /usr/local/bin
[root@k8s-master01 ~]# helm version
version.BuildInfo{
    
    Version:"v3.7.2", GitCommit:"663a896f4a815053445eec4153677ddc24a0a361", GitTreeState:"clean", GoVersion:"go1.16.10"}

2. Install the Prometheus/Grafana package

1) Install Prometheus

[root@k8s-master01 ~]# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
[root@k8s-master01 ~]# helm install prometheus -n crane-system --version 19.6.1 \
--set pushgateway.enabled=false \
--set alertmanager.enabled=false \
--set server.persistentVolume.enabled=false \
-f https://raw.githubusercontent.com/gocrane/helm-charts/main/integration/prometheus/override_values.yaml \
--create-namespace  prometheus-community/prometheus

2) Install Grafana

[root@k8s-master01 ~]# helm repo add grafana https://grafana.github.io/helm-charts
[root@k8s-master01 ~]# helm install grafana --version 6.11.0 \
-f https://raw.githubusercontent.com/gocrane/helm-charts/main/integration/grafana/override_values.yaml \
-n crane-system \
--create-namespace grafana/grafana

3. Install the Crane package

1) Install Crane and Fadvisor

[root@k8s-master01 ~]# helm repo add crane https://gocrane.github.io/helm-charts
[root@k8s-master01 ~]# helm install crane -n crane-system --create-namespace crane/crane
[root@k8s-master01 ~]# helm install fadvisor -n crane-system --create-namespace crane/fadvisor

2) Verify that the installation was successful

[root@k8s-master01 ~]# kubectl get pod,deploy -n crane-system

insert image description here
3) Modify the ConfigMap configuration of the Craned service and adjust the address of the reverse proxy

[root@k8s-master01 ~]# kubectl get service craned -n crane-system -o yaml > 1.yaml
[root@k8s-master01 ~]# sed -i 's/type: ClusterIP/type: NodePort/g' 1.yaml 
[root@k8s-master01 ~]# sed -i '/targetPort: 9090/a\    nodePort: 30080' 1.yaml
[root@k8s-master01 ~]# kubectl apply -f 1.yaml
[root@k8s-master01 ~]# kubectl edit cm nginx-conf -n crane-system
:%s/craned.crane-system:8082/127.0.0.1:8082/g

[root@k8s-master01 ~]# kubectl get pod -n crane-system | awk '/^craned/{print $1}' | xargs kubectl delete pod -n crane-system
  • Because both the Dashboard and Craned services are in the same Pod, and the Dashboard container is proxied to the Craned service through Service plus port;
  • But the Pod cannot directly access itself through the Service plus port , so here we use 127.0.0.1(Lo) as a proxy.

insert image description here

4. Use Smart Elastic EffectiveHPA

Kubernetes HPA supports rich elastic expansion capabilities. Kubernetes platform developers deploy services to implement custom metric services, and Kubernetes users configure multiple built-in resource indicators or custom metric indicators to achieve custom horizontal elasticity.


EffectiveHorizontalPodAutoscaler(EHPA for short) is an elastic scaling product provided by Crane. It is based on community HPA for underlying elastic control and supports richer elastic trigger strategies (prediction, observation, cycle), making elasticity more efficient and ensuring service quality.

  • Expansion in advance to ensure service quality: Predict future traffic peaks and expand capacity in advance through algorithms to avoid avalanches and service stability failures caused by untimely expansion.
  • Reduce invalid shrinkage: By predicting the future, unnecessary shrinkage can be reduced, resource usage of workloads can be stabilized, and misjudgments can be eliminated.
  • Support Cron configuration: Support Cron-based elastic configuration to deal with abnormal traffic peaks such as big promotions.
  • Compatible with the community: Using the community HPA as the execution layer of elastic control, the capability is fully compatible with the community.

1) Install Metrics Server

[root@k8s-master01 ~]# wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.3/components.yaml
[root@k8s-master01 ~]# sed -i '/- args:/a\        - --metric-resolution=15s' components.yaml
[root@k8s-master01 ~]# sed -i 's@image:.*@image: docker.io/gocrane/metrics-server:v0.6.3@g' components.yaml
[root@k8s-master01 ~]# kubectl apply -f components.yaml

2) Create a test application

[root@k8s-master01 ~]# kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/examples/autoscaling/php-apache.yaml
[root@k8s-master01 ~]# kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/examples/analytics/nginx-deployment.yaml

3) createEffectiveHPA

[root@k8s-master01 ~]# kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/examples/autoscaling/effective-hpa.yaml

insert image description here
4) Increase the load to see if the application can expand normally

[root@k8s-master01 ~]# kubectl run -i --tty load-generator --rm --image=busybox:1.28.4 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

insert image description here
insert image description here

  • It can be seen that as the number of requests increases, the CPU utilization will continue to increase, and the instance will be automatically expanded through EffectiveHPA.

4. Configure the cluster

access:http://192.168.1.1:30080

insert image description here

3. Function verification

1. Cost display

Crane Dashboard provides a variety of charts showing the cost and resource usage of the cluster.
insert image description here
insert image description here

2. Resource recommendation


When Kubernetes users create application resources, they often set requestand limitresource recommendation algorithm to analyze the actual usage of the application and recommend a more appropriate resource configuration. You can refer to and adopt it to improve the resource utilization of the cluster. The recommendation algorithm model uses VPA's sliding window (Moving Window) algorithm for recommendation:

  • Obtain the historical CPU and memory usage of Workload in the past week (configurable) through monitoring data.
  • The algorithm considers the timeliness of data, and newer data sampling points will have higher weights.
  • The CPU recommended value is calculated based on the target percentile value set by the user, and the memory recommended value is based on the maximum value of historical data.

1) Use resource recommendation
insert image description here
insert image description here

3. Recommended number of copies

Kubernetes users often set the number of replicas based on experience when creating application resources. Analyze the actual usage of the application through the algorithm recommended by the number of replicas and recommend a more appropriate replica configuration. You can also refer to and adopt it to improve the resource utilization of the cluster.

The basic algorithm it implements is based on the historical CPU load of the workload, finds the CPU usage with the lowest hourly load in the past seven days, and calculates the number of copies that should be configured according to the 50% (configurable) utilization rate and the workload CPU Request.

insert image description here
insert image description here
insert image description here

Four. Summary

After the overall operation experience, the installation is relatively simple, and the convenient installation can be realized by using the Helm tool. Once installed, it can be accessed through the provided Dashboard panel. The interface is simple and clear, and it is easy to use. At the same time, it realizes functions such as real-time monitoring of costs, application resource usage, resource recommendation, and intelligent elasticity at the functional level.

  • Monitor costs in real time : Crane provides hourly, daily, weekly and monthly cost overviews and can break down costs by cluster, namespace and workload.
  • Resource monitoring : Realize resource monitoring and visual UI by calling Prometheus and Grafana.
  • Resource recommendation : Use the VPA algorithm model to first analyze the actual usage of the application, and then calculate the optimal resource allocation for the application.
  • Smart Elasticity : Based on community HPA for underlying elastic control, it supports richer elastic trigger strategies to make elasticity more efficient and guarantee service quality.

Therefore, we can use the cost calculation, resource usage rate, and the above-mentioned functions provided by the Crane service to observe whether it is necessary to expand or reduce the resource capacity of the cloud server. Avoid resource shortage or resource waste, and achieve real cost reduction and efficiency increase.


About Tencent Cloud Finops Crane Training Camp:

The Finops Crane training camp is mainly for developers. It aims to improve developers' hands-on practical ability at the container deployment and K8s level. At the same time, it absorbs Crane open source project contributors, encourages developers to submit and give feedback, and is equipped with online live broadcast and hands-on issueexperiments bug. A series of technical activities such as team formation and award-winning essay collection. It will not only allow developers to have an in-depth understanding of the Finops Crane open source project through the event, but also help developers gain substantial gains in cloud-native skills.

In order to reward developers, we have specially set up points acquisition tasks and corresponding points exchange gifts.

Guess you like

Origin blog.csdn.net/weixin_46902396/article/details/130564335