Huawei Cloud Yaoyun Server L instance evaluation | Trial of the host security product Elkeid on Huawei Cloud

Huawei Cloud Yaoyun Server L instance evaluation | Trial of the host security product Elkeid on Huawei Cloud

1. Background: What is host security?

With the development of cloud technology, major banks, large and small enterprises, and governments are now frequently using cloud technology. In these fields, security is obviously a very important factor. As a place for data storage, cloud hosts are facing the threat of constant intrusion by hackers. So what is host security?

Host security specifically refers to ensuring the confidentiality, integrity, and availability of the host in data storage and processing. It includes the own security of hardware, firmware, and system software, as well as a series of additional security technologies and security management measures, thereby establishing a complete Host security environment.

After more than 20 years of construction, my country's information security has achieved certain results in anti-virus, network and border security. However, the security construction of the host environment is relatively weak. The host is the most important and last line of defense for information security.

In fact, host security, from the perspective of my personal understanding, is more often a host layer intrusion detection system HIDS (Host-based Intrusion Detection System), which mainly realizes intrusion detection/behavior auditing/attack source tracing/ Asset inventory/compliance baseline testing and other capabilities.

Today we are going to experience using Elkeid on Huawei Cloud. Elkeid: is an open source HIDS project of ByteDance.

2. Host Security Elkeid

1. Introduction to Elkeid

Elkeid open source version project address: https://github.com/bytedance/Elkeid

An open source project in the security field, this open source project is called Elkeid, which means Yaoguang/Pojun, and is also one of the Seven Stars of the Big Dipper. The problem it solves is host security .

Elkeid is a CWPP product of Volcano Engine, designed to meet the security needs of modern enterprises under complex technical architectures. Elkeid originated from ByteDance's internal best practices and natively integrates anti-intrusion capabilities for multiple workloads such as servers/containers/Serverless, covering server and container anti-intrusion, container cluster anti-intrusion, and application runtime protection. RASP (Runtime Application Self-Protection), threat tracing and hunting, workload asset inventory, workload vulnerability discovery, exposure analysis and other capabilities, and provides an open policy engine to help enterprises better implement an integrated solution Ensure the security of workloads on and off the cloud.

Elkeid is a new host security solution self-developed by ByteDance’s internal security and risk control team. It has several distinctive features.

  • One is its large scale, which can support the number of millions of servers within ByteDance. Of course, when it comes to this, we also give a little spoiler on ByteDance’s internal server scale.
  • Another feature is that Elkeid uses our kernel state technology to collect most indicators and information.
  • On the one hand, this can greatly improve performance, and on the other hand, it can also collect more and richer data, thereby greatly enhancing our detection capabilities.

At present, the deployment scale of the full version of Elkeid has reached 1 million levels, and its stability, performance, data collection capabilities, detection capabilities, and traceability capabilities have all been verified in actual combat and have performed well.

Product advantages

  • Backend architecture solution for millions of agents
  • Distributed, decentralized, cluster high availability
  • Simple deployment, few dependencies, and easy maintenance

Note: This product is not fully open source. As of now (2023-9), the Elkeidup automated deployment tool and Hub are not open source, nor is the front end.

2. Elkeid Server

Elkeid Server needs to be used together with the data collection layer (Elkeid Driver/Agent) on the end to realize the monitoring, management and policy update of large-scale Agents. It can adapt to various complex network environments and can be deployed on a single machine or in a cluster. For specific deployment plans, please go to Repo to view.

Open source address: https://github.com/bytedance/Elkeid/tree/main/server

3. Elkeid Server Architecture

Insert image description hereInsert image description here
Insert image description here
Elkeid Server generally contains 4 modules:

  • AgentCenter: Responsible for communicating with the Agent, collecting Agent data and briefly processing it and summarizing it to the message queue cluster ( Kafka cluster ). It is also responsible for managing the Agent, including Agent upgrades, configuration modifications, task distribution, etc.

  • ServiceDiscovery: Each service module in the background needs to regularly register and synchronize service information with the ServiceDiscovery center to ensure that the instances in each service module are visible to each other and facilitate direct communication.

  • Manager: Responsible for managing the entire backend and providing relevant query and management interfaces.

  • Real-time/offline computing module: The consumer server collects data in the message queue and performs real-time and offline analysis and detection. (This part is not open source yet)

Simply put it is:

  • AgentCenter collects Agent data

  • Manager manages AgentCenter and these computing modules

  • ServiceDiscovery connects all these services and nodes in series

  • Real-time/offline computing module analyzes and detects these data

The Elkeid backend generally contains 5 modules:

  1. AgentCenter (AC) is responsible for communicating with the Agent, collecting Agent data and briefly processing it and summarizing it to the message queue cluster. It is also responsible for managing the Agent, including Agent upgrades, configuration modifications, task distribution, etc. At the same time, AC also provides HTTP interfaces to the outside world. Manager manages and monitors AC and Agent through these HTTP interfaces.

  2. ServiceDiscovery (SD), each service module in the background needs to regularly register and synchronize service information with the SD center to ensure that the instances in each service module are visible to each other and facilitate direct communication. Since SD maintains the status information of each registered service, when the service user requests service discovery, SD will perform load balancing. For example, when the Agent requests a list of AC instances, SD directly returns the AC instance with the smallest load pressure.

  3. Manager is responsible for managing the entire backend and providing relevant query and management interfaces. It includes managing AC clusters, monitoring AC status, controlling AC service-related parameters, managing all Agents through AC, collecting Agent running status, and issuing tasks to Agents. At the same time, the manager also manages real-time and offline computing clusters.

  4. Elkeid Console: Elkeid front-end part.

  5. Elkeid HUB : Elkeid HIDS RuleEngine。

To put it simply, AgentCenter collects Agent data, Elkeid HUB analyzes and detects these data, Manager manages AgentCenter and these computing modules, ServiceDiscovery connects all these services and nodes, and alarms and asset data can be viewed through the Elkeid Console. wait.

Elkeid AgentCenter (hereinafter referred to as AC)

On the one hand, the AC needs to collect data from the Agent and perform preliminary processing, and then write the processed data to the Kafka cluster (for subsequent consumption by the analysis module). On the other hand, it needs to issue instructions to the Agent. This communication is two-way.

At the same time, AC also provides HTTP interfaces to the outside world. Manager manages and monitors AC and Agent through these HTTP interfaces.

Insert image description hereRelated technology introduction:

  • Communication efficiency: With millions of agents, the pressure on the backend from such a large amount of data cannot be underestimated. The communication and processing efficiency are mainly affected by the communication protocol and encoding method. Comparing various communication methods, we finally chose to use gRPC bidirectional flow .

    • On the one hand, gRPC is designed and developed based on the HTTP2 protocol standard. Compared with other RPC frameworks, gRPC brings more powerful features, such as bidirectional streaming, header compression, etc. These are very suitable for our current needs, and the communication efficiency is also very high.

    • On the other hand, we use Protobuf as the encoding method. Protobuf has standard IDL and IDL compiler, and the serialized data is very concise and compact; in addition, the encoding and decoding speed is also leading among many serialization protocols.

  • Communication security: Agent is an application with root authority, and Server has the ability to issue instructions to Agent. If the communication link is controlled, it will be disastrous. Here we use two-way SSL verification. On the one hand, it ensures that the Agent/Server will not communicate with unknown peers. On the other hand, SSL also ensures that the data in the communication process is encrypted.

Summary: grpc bidirectional flow + bidirectional SSL verification.

Elkeid Service Discovery (hereinafter referred to as SD)

There are roughly two design ideas for the service discovery/load balancing mechanism, one is a centralized proxy and the other is a terminal proxy.

In centralized proxy methods such as F5 and nginx, the request is first sent to the centralized proxy point, and then the proxy forwards it according to a certain load balancing algorithm. The service response will also first go to the proxy point and then be forwarded to the requesting end. Although this method is the most common, there are two problems. One is that the request response delay will increase when the agent is forwarded back and forth . This solution cannot be used in a large-scale deployment environment.

Insert image description here

Related technology introduction:

  • The service provider regularly sends registration information to the SD, including the service name, instance ip, port, load status, etc., so that the SD maintains the status of all the service instances.

  • Data synchronization is required between nodes in the SD service. For example, the registration information in 1 above is sent to NodeA, and NodeA needs to synchronize this information to NodeB, so that if there is a request to access NodeB, NodeB also has corresponding registration information.

  • The service user requests the service discovery/registration center through the service name to obtain a list of instance IPs and ports that can be used under the corresponding service name, and then can directly access it.

  • Data synchronization between SD nodes is batch synchronized in a broadcast manner, ensuring a certain level of performance. The data between nodes at a certain point in time is not consistent, but it can remain consistent in the end, which satisfies the AP in the distributed CAP theory and has no impact in large-scale Agent and AC scenarios. In addition, no consistency middleware (etcd, zookeeper, etc.) is used, making it easy to deploy and maintain.

  • Since SD maintains the status information of each registered service, when the service user requests service discovery, SD will perform load balancing. For example, when the Agent requests a list of AC instances, SD directly returns the AC instance with the smallest load pressure.

Elkeid Manager

Manager manages the entire backend and provides various query and management interfaces. It includes managing AC clusters, monitoring AC status, controlling AC service-related parameters, managing all Agents through AC, collecting Agent running status, and issuing tasks to Agents. At the same time, the manager also manages real-time and offline computing clusters.

Since Manager manages each cluster in the background, any management operation is an operation on the cluster. The interface request calling Manger will be forwarded to all nodes in the target cluster through Manager, and then the responses from all nodes will be collected and processed. In order to improve the response speed and stability of this process, Manager implements a simple distributed task management system internally.

3. Install, deploy and try out Elkeid on Huawei Cloud

1. Huawei cloud host preparation

  1. Purchase Huawei cloud host. The evaluation system for this evaluation is as follows:
    Insert image description here2. Create a new security group and develop all ports to facilitate testing
    Insert image description hereand change the security group. As follows, select our security group to develop all ports:
    Insert image description here
  2. After developing all the ports, we can log in to the Huawei Cloud host via ssh~

2. Elkeid installation and configuration

Official reference: http://elkeid.bytedance.com/docs/elkeidup/deploy-zh_CN.html
Basic reference to the official documentation is enough!

Due to limited resources in the evaluation environment, I will use the official docker installation to simply try out Elkeid~
0. Install docker first

yum install docker

1. Quick deployment of stand-alone docker (recommended for stand-alone test environment)
Note: Please give priority to centos 7.x or debian 9/10 as the host. The services in the container rely on systemd, and systemd uses cgroup. The systemd version inside and outside the container is too different. As a result, systemd in the container runs abnormally and the corresponding service cannot be started.

1.1. Import image

# 从release下载的是分卷的镜像,需要先合并镜像
wget https://github.com/bytedance/Elkeid/releases/download/v1.9.1.4/elkeidup_image_v1.9.1.tar.gz.00
wget https://github.com/bytedance/Elkeid/releases/download/v1.9.1.4/elkeidup_image_v1.9.1.tar.gz.01
wget https://github.com/bytedance/Elkeid/releases/download/v1.9.1.4/elkeidup_image_v1.9.1.tar.gz.02
wget https://github.com/bytedance/Elkeid/releases/download/v1.9.1.4/elkeidup_image_v1.9.1.tar.gz.03
cat elkeidup_image_v1.9.1.tar.gz.* > elkeidup_image_v1.9.1.tar.gz

Personal test shows that Huawei Cloud download is very slow. It is recommended to download it through Thunder and then upload it yourself.

#导入镜像
docker load -i elkeidup_image_v1.9.1.tar.gz

1.2. Run the container

docker run -d --name elkeid_community \
  --restart=unless-stopped \
  -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
  -p 8071:8071 -p 8072:8072 -p 8080:8080 \
  -p 8081:8081 -p 8082:8082 -p 8089:8080  -p 8090:8090\
  --privileged \
  elkeid/all-in-one:v1.9.1

1.3. Set the external IP
to use the local IP. 127.0.0.1 cannot be used.

docker exec -it elkeid_community bash

cd /root/.elkeidup/

# 命令为交互式
./elkeidup public {
    
    ip}


./elkeidup agent init
./elkeidup agent build
./elkeidup agent policy create

cat ~/.elkeidup/elkeid_passwd

After testing 2C2G to install elkeid, the error is as follows. It is estimated that the memory is too small. We
Insert image description here
verified it again with the 2C4G machine, and still reported the same conflict. Because byte elkeipdup is not open source, the reason is unknown (most likely it is a resource size problem, my local There is no problem in the installation test)~
This evaluation will be processed here for the time being~~

3. Problem records during docker installation process

docker 报错:Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

This error indicates that it cannot connect to the Docker daemon. Common reasons include:

  1. Docker daemon is not running.
    You can use sudo systemctl status dockerto check whether the docker service is running. If it is not running, you can use sudo systemctl start dockerto start it .
sudo systemctl start docker
sudo systemctl enable docker

docker报错: No chain/target/match by that name.

Complete error message:

[root@dev elkeid]# docker run -d --name elkeid_community \
>   --restart=unless-stopped \
>   -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
>   -p 8071:8071 -p 8072:8072 -p 8080:8080 \
>   -p 8081:8081 -p 8082:8082 -p 8089:8080  -p 8090:8090\
>   --privileged \
>   elkeid/all-in-one:v1.9.1
5f24d42cfccf965d9a3ce6b7e5323049ff4fb9ef09d3ad1541b2391b28e3385b
/usr/bin/docker-current: Error response from daemon: driver failed programming external connectivity on endpoint elkeid_community (fd39db5e047de94bbba8f65474cf54ea9605062b36e630e4c46de56fe782c3e9):  (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8090 -j DNAT --to-destination 172.17.0.2:8090 ! -i docker0: iptables: No chain/target/match by that name.
 (exit status 1)).

Problem analysis: This problem is that when using the docker run command to start the container, Docker fails to set the port mapping rules of the container.
Specifically, adding DNAT rules using iptables failed, prompting that iptables does not have the specified chain, target or match.
This problem is common in some minimal Linux distributions, which do not have iptables or the nat module of iptables installed by default.

problem solved:

  1. Make sure that iptables and the nat module of iptables are installed on the host
yum install iptables-services -y
  1. Enable the nat function of iptables
modprobe iptable_nat
  1. Restart docker service
systemctl restart docker

4. Use of Elkeid

After successful installation, the container's /root/.elkeidup/elkeid_passwdfile records the passwords and related URLs of each component.

Visit elkeid_console and follow the commands on the installation configuration interface to install and deploy the Agent.

4. Reference

[Elkeid Strategy] Fight against hackers: How to use Elkeid to build intrusion detection capabilities
Reference URL: https://www.zhihu.com/column/c_1411384767867162624
How does Bytedance build an open source project from 0 to 1?
Reference URL: https://www.51cto.com/article/711324.html

Guess you like

Origin blog.csdn.net/inthat/article/details/132701074