Is your data safe? Hadoop Exposed Security Vulnerability Again | Hackers Use Unauthorized Access Vulnerability in Hadoop Yarn Resource Management System to Attack

Abstract:  On April 30, Alibaba Cloud discovered that Russian hackers exploited the unauthorized access vulnerability in the REST API of the Hadoop Yarn resource management system. Hadoop is a distributed system framework launched by the Apache Foundation. It performs distributed processing through the famous MapReduce algorithm. Yarn is a resource management system for Hadoop clusters.

On April 30, Alibaba Cloud discovered that Russian hackers exploited the unauthorized access vulnerability in the REST API of the Hadoop Yarn resource management system.

Hadoop is a distributed system framework launched by the Apache Foundation. It performs distributed processing through the famous MapReduce algorithm. Yarn is a resource management system for Hadoop clusters.

This incident was mainly due to the improper configuration of the Hadoop YARN resource management system, which led to unauthorized access, which was maliciously used by attackers. Attackers can deploy tasks through the REST API without authentication to execute arbitrary instructions, ultimately taking full control of the server.

Utilization Mode Restoration and Trend Judgment

1. Through comparative analysis, Alibaba Cloud security experts observed that, compared with the previous Redis and CouchDB incidents, Hadoop, as a distributed computing application framework, makes it easier to be "captured" because:

  • There are many types and functions of Hadoop, and various component security issues may bring a larger attack surface;

  • Attacks targeting a weak point may quickly spread to all nodes through the distributed nature of the framework.

2. The means of monetizing the intrusion of gray and black products are turning from intrusion to use common host resources on the cloud to mine for profit (Web server, database server, etc.), to attacking dedicated computing power applications, in order to steal more computing power for mining Profitable shifts (distributed computing platforms such as Hadoop). From the analysis of this sample, the use of dedicated computing power to attack the realization method is still in the early testing stage; with the further prosperity of cryptocurrencies, the risk of this type of attack will become more and more prominent.

In general, the thirst for economic benefits of gray and black production drives the changes and upgrades of this industry. As the popularity of the cryptocurrency market rises, the gray industry that invades mining will also expand; mining, the most effective means of monetization, has an ever-expanding demand for computing power, which will inevitably lead the attack targets of gray and black production to gradually shift to higher levels. Computing products and services.

Therefore, Alibaba Cloud security experts suggest that both the computing power product provider of SaaS-based services and the final user of computing power should pay more attention to security issues. Guarantee the long-term healthy and stable development of the business.

Security advice

In response to such large-scale attacks, Alibaba Cloud platform can block them by default, reducing the direct impact of vulnerabilities on users;

If enterprises want to completely solve Hadoop security vulnerabilities, it is recommended that enterprises use Alibaba Cloud MaxCompute  (with "zero" security vulnerabilities for more than 8 years) to store and process enterprise data;

Alibaba Cloud Data Plus MaxCompute (formerly ODPS) was originally designed for multi-tenancy. Ensuring the data security of tenants is one of the essential functions of MaxCompute. In the security design and implementation of the MaxCompute system, MaxCompute engineers will follow some practice-proven security design principles (such as the Saltzer-Schroeder principle). In the design and implementation of common cryptographic algorithms and security protocols, it will also follow relevant industry standards (such as PKCS- and FIPS-series standards), and adhere to the best security practices.


Here, we will analyze the security features of MaxCompute from the following aspects, so that readers who care about MaxCompute data security have a basic understanding. For more product information, please visit https://www.aliyun.com/product/odps  .

1. API authentication

Authentication is a secure entry to a service. MaxCompute authentication is implemented using industry-standard API authentication protocols, such as HmacSHA1. MaxCompute also provides HTTP and HTTPS EndPoints to meet users' different requirements for authentication security. HTTP EndPoint is a clear text transmission, so HmacSHA1 authentication can only guarantee the authenticity (Authenticity) and integrity (Integrity) of the message request, which is suitable for users who are not very sensitive to data security. HTTPS EndPoint can provide more security, such as channel encryption, anti-replay attack and so on. It is suitable for users who are sensitive to data security.

2. Access Control

When you create a project space, you are the owner of the project space. A project space has only one owner, and only the owner has full control over the project space. You can upload/download data, submit SQL for data processing. No other user has access to your project space without your authorization. Note that the MaxCompute platform does not have the role of a super administrator, so MaxCompute development, testing, and O&M students do not have permission to see user data. Some people may ask, is it not possible to access user data through the operation and maintenance management console behind MaxCompute? Indeed not. Only after obtaining internal authorization, operation and maintenance students can perform some operation and maintenance management operations through the console, such as stopping a malicious user job, but the console does not have permission to operate user data.

MaxCompute products are aimed at enterprise-level users, so they provide a wealth of user management and authorization functions in the project space. Interested students can refer to the relevant chapters in the MaxCompute User Manual. The granularity of MaxCompute access control is very fine. For example, you can authorize a user to read only part of the columns of a table, and you can require the user to access only within a certain time range and from certain specified IP addresses. In other words, a business owner can only allow his employees to access data within the company during normal business hours, but not when they get home from get off work.

3. Data flow control

MaxCompute was originally designed to meet data sharing (or data exchange) scenarios. Therefore, as long as there is authorization, users can easily access data across project spaces. For example, after obtaining the corresponding access rights, a job in project space A can directly process data in project space B without having to copy data from project space B to project space A in advance.

Data protection has two meanings, one is to prevent unauthorized data access; the other is to prevent the misuse of data by authorized users. Many commercial systems do not provide the latter data security guarantee. However, on the MaxCompute platform, users' concerns are more obvious: users want to ensure that they have control over their own data, but once they authorize others to read the data, others may copy the data, which is equivalent to losing control over the data. Control.

MaxCompute prevents data replication across project spaces by supporting data flow control. If you want to ensure that all data in the project space is not allowed to flow out, then you can turn on the data flow control settings of the project space. You can also set data protection policies for project spaces to limit which data can flow out to which project spaces.

4. Isolation of User Jobs

MaxCompute supports users to submit various types of jobs (such as SQL/XLib/MR). To ensure that different user jobs do not interfere with each other when running, MaxCompute runs the worker processes of user jobs in the Feitian Container sandbox. If the user job contains Java code (such as UDF), when the worker process in the Feitian Container sandbox starts the JVM, a strict Java sandbox policy will be set to limit the runtime permissions of the UDF.

5. Run jobs with least

privilege The principle of least privilege is a basic guiding principle for system security and fault-tolerant design, that is, let each task run with just enough privileges (no more, no less) to execute at runtime.

The job running process of MaxCompute is generally as follows: SQL/XLib/MR jobs submitted by users will be scheduled to run on a computing cluster. When running, each job generally corresponds to a group of worker processes running in parallel. After the data is read on the cluster and processed, the data will eventually be written back to the data cluster. Take an example to understand how MaxCompute follows the principle of least privilege. We assume that user Alice is authorized to read the data of two tables t1 and t2 in a project space, but a certain SQL she submits only needs to read the data of t1. In MaxCompute, the worker process corresponding to this SQL can only read the underlying data file corresponding to t1 when it is running, and will not have more data access rights.

The minimum permissions of MaxCompute are implemented by relying on the Kerberos authentication and Capability access control provided by the underlying Feitian distributed operating system. Kerberos authentication is used to solve the identity authentication between Feitian underlying service modules, and Capability is used to solve the access control technology between underlying service modules. This is a completely orthogonal security mechanism to the authentication and access control provided by upper-layer MaxCompute, and is completely transparent to MaxCompute users.

6. Data Access Audit

MaxCompute also provides accurate and fine-grained data access operation records, which will be stored for a long time. The MaxCompute platform system relies on many functional service modules, which we can call the underlying service stack. For data operation records, MaxCompute collects all operation records on the service stack, from the data access logs at the upper table/column level to the data operation logs on the underlying distributed file system. Every data access request processed on the bottom-level distributed file system can also be traced back to the data access initiated by which user and which job in which project space at the top level.

With the operation audit of each layer on the service stack, even if an internal attacker (engineer or hacker who penetrates into the internal system) wants to directly access user data on the underlying distributed file system from the inside (bypassing the MaxCompute service), It must also be discoverable from the operation log. Therefore, through data access auditing, users can accurately know whether there is unauthorized data access to their data on MaxCompute.

7. Risk control

In addition to defense mechanisms at different levels, MaxCompute products also provide a security monitoring system to monitor user jobs and security activities of user data, such as AccessKey abuse, improper security configuration of project space, and user code runtime. Violation of security policies, and whether user data is subject to abnormal access, etc.

Security attacks are hard to prevent, so MaxCompute will use security monitoring methods to detect problems in time. Once a security problem is found, it will initiate a corresponding processing process to minimize user losses as much as possible.

Conclusion: Although there is no absolute security, security has the highest priority in the design and implementation of MaxCompute products. The MaxCompute team has gathered security experts from various fields to ensure user data security. At the same time, we welcome more security experts to join us to jointly enhance MaxCompute's data security.

Original link

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325959353&siteId=291194637