[Tencent Cloud Finops Crane Training Camp] The actual combat record of Crane, an artifact of cost reduction and efficiency enhancement

insert image description here

foreword

During this period of time, I was fortunate to participate in the Crane public experience training camp of the Tencent Finops Crane training camp.

No, I will write an actual combat record for everyone.

1. What is Crane?

As shown in the figure below, you can see the detailed introduction of Crane:

insert image description here

To put it simply, under the background of the continuous development and popularization of cloud-native technology, this is the first cost optimization project based on cloud-native technology in China. A certified open source solution for reducing costs and increasing efficiency.

The original intention of the Crane project is to help enterprises better manage and expand their Kubernetes clusters, so as to achieve more efficient cloud native application management.

Here is the activity link for everyone: Crane activity address

If there are technical experts who are interested in this Crane, welcome to find and study it on Github!
The direct link is as follows: https://github.com/gocrane/crane

At present, Crane has achieved large-scale implementation of Tencent's internal self-developed business, deployed hundreds of K8s clusters, managed and controlled the number of CPU cores reaching one million, and achieved phased results in reducing costs and increasing efficiency.

For example, by using FinOps Crane, a department of Tencent has tripled its resource utilization rate while ensuring business stability; after another self-developed business of Tencent landed on Crane, it achieved a total CPU scale of 400,000 within a month. The amount of nuclear energy savings is equivalent to a cost savings of over 10 million yuan/month.

Next, we will talk about the real function of the Crane from the three stages!

Crane's main function?

  1. Supports the use of multiple container orchestration engines to manage the life cycle of container applications, including Docker and Kubernetes.
  2. Provides an easy-to-use way to build and publish container images, and supports the use of templates and plug-ins to customize the build process.
  3. Provides integrated monitoring and logging functions to help developers discover and solve application problems in a timely manner.
  4. Multiple containers can be combined into a complete application, providing flexible network and storage options for better management and scaling of the application.
  5. Use a concise and clear configuration file format to manage the configuration of container applications, and support the use of environment variables and parameters to dynamically configure applications.
  6. Provide cost visualization and optimization evaluation, multi-dimensional cost insights, and optimization evaluation. Support multi-cloud billing through Cloud Provider.
  7. Based on QOS integration, QOS-related capabilities ensure the stability of Pods running on Kubernetes. It has interference detection and active avoidance capabilities under multi-dimensional index conditions, supports precise operation and custom index access; has predictive algorithm-enhanced elastic resource oversold capabilities, reuses and limits idle resources in the cluster; has enhanced bypass cpuset Management capabilities to improve resource utilization efficiency while binding cores.

What is FinOps?

FinOps is a methodology for managing cloud computing costs that combines the knowledge of finance and technology teams with the aim of reducing cloud computing costs and improving business efficiency and agility by optimizing resource usage. The goal of FinOps is to ensure the balance between cost and business value of cloud computing through transparency, accountability and collaboration.

A FinOps framework typically includes the following aspects:

  • Cost Visibility: By monitoring and analyzing the usage of cloud resources, and categorizing the costs of different departments and projects, to ensure that everyone can see the actual situation of cloud computing costs.
  • Assignment of Responsibility: Encourage more responsible behavior by promoting accountability across teams for resource usage and costs by allocating cloud computing costs to each department or project.
  • Cost optimization: Minimize cloud computing costs by analyzing usage and historical data, and using automated tools and best practices to optimize resource usage.
  • Collaboration and culture: Encourage collaboration across teams to better manage cloud computing costs by establishing culture and practices.
    insert image description here

What is Prometheus

Prometheus is an open source system monitoring and alerting system, a standard way to measure, collect and aggregate data in large-scale distributed systems. Prometheus uses a custom query language called PromQL to query and analyze the collected data, and provides a graphical web interface and API.

What is Grafana

Grafana is a popular open source data visualization and monitoring platform. It can connect multiple data sources, including Prometheus, InfluxDB, Elasticsearch, MySQL, etc., and use tools such as charts, dashboards, and alerts for data visualization and analysis.

2. The problem we have to face: the challenge of resource efficiency on the cloud!

As shown in the following data chart from Tencent, the report on the development status of the cloud computing market in 2022 shows that 32% of cloud spending is wasted.

insert image description here
So what are the cost management challenges in the post-cloud native era? You can take a look at a few in detail:

  • Decentralization: With the vigorous development of cloud-native applications centered on Kubernetes, the traditional centralized financial budget and IT management models are transforming into business-oriented distributed decision-making.
  • Rising: According to the NCF survey, with the rapid development of business, the cloud expenses of enterprises increase rapidly at an annual growth rate of 24%. (This problem cannot be ignored. The cloud costs of enterprises are rising year by year, so it must lead to the concept and trend of reducing costs and increasing efficiency!)
  • Dynamic changes: The cloud-native dynamic environment and elastic capabilities cause cloud costs to change continuously with business loads.
  • Serious waste: After the business is migrated to the cloud, there is no awareness of resource optimization, and resources are still managed with traditional resource allocation thinking, resulting in serious waste. This is also crucial, so we really need to reduce costs and increase efficiency.

3. Cost optimization challenges in cloud-native scenarios?

First, let’s introduce some common cost optimization ideas and common methods:

insert image description here

You also need to understand what the cost components are:

insert image description here

So in general, what is the core of cloud cost management?

It is to minimize resource requirements under the premise of ensuring business.

4. Insufficient native capabilities of K8s

Now in the cloud-native era, many business needs are directly managed and controlled by K8s, but K8s also has certain shortcomings, as shown in the following figure:
insert image description here

5. Crane Intelligent Scheduling Helps Cost Optimization

First, let's take a look at Crane's architecture diagram:

  • Craned:
    Crane is the core component of Crane, which manages the life cycle of CRDs and API. Craned is deployed by Deployment and consists of two containers:
    Craned: runs Operators to manage CRDs, provides WebApi to Dashboard, and Predictors provides TimeSeries API.
    Dashboard: A front-end project developed based on TDesign's Starter scaffolding, providing easy-to-use product functions.

  • Fadvisor
    Fadvisor provides a set of Exporter computing cluster cloud resource billing and billing data and stores it in your monitoring system, such as Prometheus. Fadvisor supports multi-cloud billing API through Cloud Provider.

  • Metric Adapter
    Metric Adapter implements a Custom Metric Apiserver. Metric Adapter reads CRDs information and provides HPA Metric data based on Custom/External Metric API

  • Crane Agent is deployed on the nodes of the cluster through DaemonSet.

insert image description here

So how do you go about cost reduction? The methods are as follows:

• Pull usage data from the monitoring system
• Pull job configuration information from the Kubernetes platform
• Pull resource unit prices from cloud vendor billing APIs
• Analyze cost components through multiple algorithms and give optimization suggestions

insert image description here

Use Scenario 1: Cost Visualization

Through Crane, we can display and forecast costs , as well as trend analysis based on forecasts.

insert image description here

Cost and waste identification is also possible : i.e. cost presentation integrated with billing API.

insert image description here
It is also the first cloud-native carbon emission computer in China
insert image description here
insert image description here
. In addition, it can also perform flexible aggregation dimensions, such as: by department, project, application type, etc.
insert image description here

Scenario 2: Specification recommendation

First of all, Crane is declarable, and it gives recommended values ​​for the specified namespace and workload type.

Also supports configurable based algorithms such as

  • Scenarios that need to be dynamically adjusted: index bucket + half-life
  • Static recommendation scenario: balanced bucket + no half-life
  • Margin or target utilization

First do a console click to complete the rolling upgrade.

insert image description here
Then carry out cost left shift and automatic control.
insert image description here
Finally, you can see the specification recommendation result!
insert image description here

Scenario 3: Intelligent prediction and automatic expansion

Crane can carry out expansion in advance, according to the time series algorithm (such as FFT fast Fourier change), take the maximum value of the prediction window: expansion in advance, and this function can also be realized based on Custom Metric and metric protection.

At the same time, Crane supports Cron configuration, which is to deal with regular traffic peaks such as big promotions, holidays, etc.

insert image description here

insert image description here
Finally, the elastic effect can be predicted, as follows:

insert image description here

6. Achievements

insert image description here

• Large-scale implementation of Tencent's internal self-developed business
• Deployed to hundreds of Kubernetes clusters
• Controlling millions of CPU cores
• Within 2 months of full launch, the total number of cores in the market has been reduced by 25%

insert image description here
• Landed on Netease News
• Managed and controlled the number of WCPU cores
• Within 2 months of full launch, the total number of cores in the market has been reduced by 11%
• Offline 30 56C physical machines, saving 10W per month

7. Actual deployment and use

  • install kubectl
curl.exe -LO "https://dl.k8s.io/release/v1.27.1/bin/windows/amd64/kubectl.exe"
  • Install Helm
choco install kubernetes-helm 
  • Install kind (in general, Docker is installed by the way)
choco install kind 
  • Install Crane and its associated Prometheus Grafana dependencies
curl -sf https://raw.githubusercontent.com/gocrane/crane/main/hack/local-env-setup.sh | sh -
  • View Pod status (Pod needs to wait for startup)
$ export KUBECONFIG=${
    
    HOME}/.kube/config_crane
$ kubectl get pod -n crane-system

NAME                                             READY   STATUS    RESTARTS       AGE
craned-8cks5c939f-kkfhv                          2/2     Running   0              3m36s
fadvisor-5b92cdfvq98b6-xqhfs                     1/1     Running   0              3m47s
grafana-33pov56f6y54-4jks6                       1/1     Running   0              3m29s
metric-adapter-967c6d57f-so9vv                   1/1     Running   0              4m11s
prometheus-kube-state-metrics-7k9d6s8fdvc-qql9c  1/1     Running   0              3m42s
prometheus-server-pvl64f4b7-47dvu                2/2     Running   0              3m38s

  • Visit the Crane Dashboard
kubectl -n crane-system port-forward service/craned 9090:9090
  • Note that each operation must have the following command (environment variable) when operating on a new port
export KUBECONFIG=${
    
    HOME}/.kube/config_crane

8. Suggestions on the direction of use

In general, Crane is a multi-purpose tool that can be used in DevOps automation processes to help development teams efficiently build, test and deploy applications. Additionally, Crane can be used to build and manage cloud-native applications, which are typically distributed, scalable, and highly available. Finally, Crane can also be used for the deployment and management of large-scale clusters, such as clusters managed by container orchestration engines such as Kubernetes. For containerized microservices, Crane is also an ideal management tool, which can help developers deploy, upgrade and Manage microservices architecture. .

Summary: what follows

About Tencent Cloud Finops Crane Training Camp:

The Finops Crane training camp is mainly for developers. It aims to improve developers' practical ability in container deployment and K8s level. At the same time, it absorbs Crane open source project contributors and encourages developers to submit issues and bug feedback. A series of technical activities such as hands-on experiment team formation and award-winning essay collection. It will not only allow developers to have an in-depth understanding of the Finops Crane open source project through the event, but also help developers gain substantial gains in cloud-native skills.

In order to reward developers, we have specially set up points acquisition tasks and corresponding points exchange gifts.

Event introduction delivery: https://marketing.csdn.net/p/038ae30af2357473fc5431b63e4e1a78

Open source project: https://github.com/gocrane/crane

Guess you like

Origin blog.csdn.net/weixin_51484460/article/details/130755339