Big Data Rights Management Tool - Ranger

1 Introduction

Ranger is an important part of security management in the HDP system. It provides the functions of specific resources (such as specific tables in HBase) permission control, access monitoring, and data encryption.

2. Component introduction

2.1 Overall description

Raner is composed of three parts: Ranger Admin, Ranger Usersync and Ranger plugin, their relationship is as follows:
Component relationship

There are descriptions of these three components on Ranger's official website

component name illustrate
Admin The Ranger Admin Portal is the central interface for security management. Users can create and update policies, which are stored in the policy database. Plugins within each component poll these policies periodically. Portal also includes an audit server that sends audit data collected from plugins for storage in HDFS or a relational database
UserSync Synchronization utility to pull users and groups from Unix or LDAP or Active Directory. User or group information is stored in the Ranger portal for policy definition.
PlugIn Plugins are lightweight Java programs embedded in each cluster component process. For example, the Apache Ranger plugin for Apache Hive is embedded in Hiveserver2. These plugins fetch policies from a central server and store them locally in a file. As user requests pass through components, these plugins intercept the requests and evaluate them against security policies. The plugin can also collect data from user requests and send this data back to the audit server in a separate thread

It should be noted that these three are independent. There will be a detailed description later

2.2 Ranger Admin

Ranger admin It consists of three parts: Web page, Rest message processing service and database. We can think of it as a centralized storage center for data (all data is stored here, but the other two components can also run independently). Its specific functions:
1. Receive user and group information from the UserSync process. and save them into MySql database. Note: These user information need to be used when configuring permission policies. This is easy to understand, just like when receiving materials, the administrator needs to register the recipient of the materials. If you are not a member of the company, you will definitely not be allowed to receive them.
2. Provide an interface for creating policy strategies
3. Provide an interface for processing external REST messages

2.3 UserSync

UserSync is a user information synchronization interface provided by Ranger, which can be used to synchronize Linux user information and LDAP user information. Through the configuration item:
SYNC_SOURCE = LDAP/Unix
to confirm whether it is LDAP or Unix, the default is to synchronize Unix information. There are three points to explain about UserSync:
1. This synchronization is one-way.
That is to say, it just reads the local user information from Unix and uploads it to Ranger Admin. Also seen in its code

    @Override
    public void updateSink(UserGroupSink sink) throws Throwable {
        isUpdateSinkSucc = true;
        buildUserGroupInfo();  // 此处user、group读取信息

        for (Map.Entry<String, List<String>> entry : user2GroupListMap.entrySet()) {
            String       user   = entry.getKey();
            List<String> groups = entry.getValue();

            try{
                sink.addOrUpdateUser(user, groups);  //此处往ranger admin发送消息
            }catch (Throwable t) {
                LOG.error("sink.addOrUpdateUser failed with exception: " + t.getMessage()
                + ", for user: " + user
                + ", groups: " + groups);
                isUpdateSinkSucc = false;
            }
        }
    }

2. UserSync is not synchronized in real time.
If a user creates a new user, but this user cannot be synchronized to Ranger immediately, for this point, consult the Ranger community, and their reply:

Regarding, Is there another way to sync user info automatically?
Ranger-Usersync process syncs the added users in particular time interval which is in minutes; By default it is set to 5 minutes.The property name is “SYNC_INTERVAL”. So it will sync newly added user after that interval or you have restart ranger-usersync.
You can update this value in install.properties file and run the setup.sh script to update the value in the Usersync process, after this you will have to restart ranger-usersync process.

But from my test results, it is not ideal. Because our system requires users to create policies on Ranger immediately after they are created. In the end, our solution is to extract the synchronization code in Ranger without using the UserSync process. Use this to achieve data synchronization

<dependency>
            <groupId>org.apache.ranger</groupId>
            <artifactId>unixusersync</artifactId>
            <version>0.7.0</version>
        </dependency>

3. Does not support deleting users.
Ranger does not support deleting users directly through synchronization or code (Ranger version 0.70). Solutions given in the community:

To delete the user from Ranger database, you will have to delete the user from Ranger UI manually, as the Ranger-Usersync process will only add users to Ranger Database.

2.3 Plugin

Because I have only read the plug-in code of Hdfs, here is also the plug-in code of HDFS for explanation.

2.3.1 Workflow of Ranger Plugin

  1. When the NameNode starts, an Hdfs Plugin thread will be created, and the configuration information will be read after the thread is started.
[root@ysbdh03 conf]# cat ranger-hdfs-security.xml 
  <configuration>

    <property>
      <name>ranger.plugin.hdfs.policy.cache.dir</name>
      <value>/etc/ranger/clustersz_hadoop/policycache</value>
    </property>
    ... ... 
 </configuration>

The corresponding policy file is stored in this directory. If the configuration file does not exist, a message will be sent to the Ranger admin to synchronize the policy information. After synchronization, it can be externalized in the form of a file. :
Write picture description here
One more point that needs to be explained here is the name of this file. In the figure above, the file name is hdfs_clustersz_hadoop, which consists of two parts, where hdfs corresponds to the hdfs component, and clustersz_hadoop corresponds to the Server in Ranger, as shown in the figure:
Write picture description here

2. After the synchronization is completed, the hdfs plugin thread will send a message to the ranger admin every once in a while (30s) to synchronize the policy information once

2.3.2 Principles of HDFS Plugin’s permission control

1. Anger's plugin will synchronize policy information from ranger admin and save it in .json format.
2. When verifying, namenode will call the checkPermission() function,
Write picture description here
which is usually two steps: the specific process is as follows:
1. First use this user to find the group to which the user belongs.
2. Get the user (kerberos user, including group) information and the folder information that needs to be read (or written), and check whether the user information is defined in the policy file. If no policy information is given, if it has been If it is clearly defined, it will return directly;
3. If the user has no definition information for the file to be operated, at this time, it will call the original hdfs, which is our common xxx – xxx —xxx, that is, whether it is in the same file as the file Group, whether it has permission.
Generally speaking, the default file permissions created by hdfs are similar to:
drwxr-xr-x - hdfs supergroup 0 2017-07-14 00:32 /test
will have read permissions, if you create a folder (file) , without modification, then all users have read permission.
If the verification is passed, it will also return, otherwise, an exception will be thrown directly, indicating that there is no permission.

3. Other

3.1 Kerberos and Ranger

Both Kerberos and Ranger are two parts of the Hadoop security system, but their division of labor is different. When I first came into contact with big data security, I never understood why two different components are needed for permission control. It took a long time to understand. The process of contacting us to go to work every day and then sit in the office may be clearer. Before

a person enters the company, the security needs to check whether he has a work card. If he does not have a work card, the security GG will think that you are not a company employee and will not allow you Enter the company campus. When you enter the company campus, you want to visit a certain warehouse at this time, and the warehouse gate will check whether you have access control, and only if you have access control will you be allowed to enter the warehouse.
We can compare the company campus to a big data cluster platform. The work card is equivalent to your Kerberos authentication, and the warehouse corresponds to the specific hdfs file path.

3.2 Ranger data encryption

The data encryption of Ranger corresponds to Hadoop KMS, if users want to use Hadoop KMS directly, it is also possible. Let's talk about the simple process of Ranger KMS (take Ranger integrated with HDP as an example)

3.2.1 Ranger encryption process

  1. After installing ranger, open the UI connection of ranger admin and enter
    Write picture description here
    with the user name keyamin (password is also keyadmin)
    Write picture description here

  2. After entering the interface, first create an encryption policy. The process is as follows:
    first select "Encryption" –> "key Manager"
    Write picture description here

  3. Enter the following page, select the corresponding Service, and click "Add Policy"
    Write picture description here

  4. Configure in the pop-up configuration item:
    Write picture description here
    After filling, click "save" to save, and the corresponding password policy configuration is completed.
    Let's start configuring the users corresponding to the password policy.

  5. The configuration method is similar to the hdfs policy configuration of ranger, that is, return to the following interface, then select the corresponding service, after entering, add the corresponding user.
    Write picture description here

The configuration on the Hadoop side is the same as Hadoop KMS.

3.2 Audit

Authority management is the main function of Ranger, and Ranger's audit, I feel a bit like a half-buy, half-free thing.
Auditing is mainly log auditing, which mainly records the access information of each component data, that is, extracts the corresponding login access information from the logs of each component, but this one is not as strong as LogSearch (another log component of HDP). Of course, if the user only needs to know some simple access information and does not want to use LogSearch, this is also possible.

Guess you like

Origin blog.csdn.net/eyoulc123/article/details/79414301