Apache Ranger: A magic weapon for operation and maintenance management

The Chinese definition of Ranger is "garden administrator". As its name suggests, Apache Ranger has taken on the responsibilities of the administrator of the big garden of Hadoop. Ranger provides a centralized security management framework. Users can configure various policies by operating the Ranger console to achieve fine-grained data access control to Hadoop ecological components such as HDFS, Hive, HBase, Yarn, etc.

According to the Apache Ranger official website, Ranger mainly implements the following functions:

(1) Manage all security tasks through a unified centralized management interface or REST interface, thereby achieving centralized security management;
(2) Through a unified centralized management interface, fine-grained operations/behaviors of Hadoop components/tools Level control;
(3) Provide a unified and standardized authorization method;
(4) Support role-based access control, attribute-based access control and other access control methods;
(5) Support user access and (and security) (Related) centralized audit of management operations.

Currently, the latest version of Ranger is 2.1.0, and the widely used version is 1.2.0.

(1) Ranger's architecture

Ranger is mainly composed of the following three components:
(1) Ranger Admin: Ranger Admin is the core module of Ranger. It has a built-in web management page, and users can formulate security policies through this web management interface or REST interface.
(2) Agent Plugin: Agent Plugin is a plug-in embedded in Hadoop ecological components. It regularly pulls and executes policies from Ranger Admin, and records operation records for auditing.
(3) User Sync: User Sync synchronizes the permissions data of the operating system users/groups (Users/Groups) to the Ranger database.

The relationship between them is shown in the following figure:
Apache Ranger: A magic weapon for operation and maintenance management

(2) Ranger's workflow

Ranger Admin is the main interface between Apache Ranger and users. When a user logs in to Ranger Admin, different security policies can be formulated for different Hadoop components; after the policies are formulated and saved, the Agent Plugin periodically (30 seconds by default) pulls all the policies configured by the component from Ranger Admin and caches them To the local. In this way, when a user requests the data service of the Hadoop component, the Agent Plugin provides the authentication service and feeds back the authentication result to the corresponding component, thus realizing the authority control function of the data service. When the user modifies the configuration policy in Ranger Admin, the Agent Plugin will pull the new policy and update it; if the user deletes the configuration policy in Ranger Admin, the authentication service of the Agent Plugin cannot continue to be used.

Take Hive as an example. Hive provides two interfaces for developers to implement their own authorization policies, namely org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerFactory and org.apache.hadoop.hive.ql.security.authorization.plugin .HiveAuthorizer. Among them, HiveAuthorizerFactory is used to generate related instances of HiveAuthorizer. When HiveAuthorizer is initialized, it will start a PolicyRefresher thread to periodically pull all Hive-related policies from Ranger Admin, write the local temporary json file and update the cache, and directly authorize according to the cached policy when authorization is required. The specific process is shown in the figure below:

Apache Ranger: A magic weapon for operation and maintenance management

(3) Ranger operation and maintenance combat

People, roles and permissions have always been the key areas of system design and operation and maintenance. If a complete set of personnel, roles and authority relationships are not established, then an "illegal user" may easily access or even tamper with system resources and data. Compared with Unix/Linux systems, which simply use "users/user groups" to set permissions, Apache Ranger provides a more user-friendly and easy-to-operate Web page to establish a complete set of personnel, roles, and permissions relationships. Authorized users can legally access authorized resources and data, and those unauthorized "illegal users" are completely "turned out."

In addition, Ranger also supports the creation of temporary policies to achieve temporary authorization of other users. After the temporarily authorized user completes the relevant operations, these temporary policies are deleted, so that the temporary authorization of the user can be realized conveniently and quickly.

Take HDFS as an example. We select Services Manager in Ranger Admin, and then click the HDFS component to enter the HDFS policy editing page. As shown below:
Apache Ranger: A magic weapon for operation and maintenance management

Click the Add New Policy button, you can start to develop a security policy. In the actual production environment, a specific security strategy we formulated is shown in the following figure:

Apache Ranger: A magic weapon for operation and maintenance management

As you can see from the figure above, this strategy is to allow users such as admin and test0822-2 of the application system to access /user, /user/rangerpath/, /user/rangerpath/data, /user/rangerpath/data/allday of HDFS components The strategy of waiting for the path and executing commands under the path. It should be noted that since we did not turn on the recursive switch when we made the policy, the user can access the /user directory, but there is no guarantee that he has permission to access the next level of the directory. Therefore, we specified the specific directories that these users can access in turn.
In the Audit tab, we can clearly see audit and status information, such as system login records, policy authentication records, Agent Plugin status, etc., as shown in the following figure:

Apache Ranger: A magic weapon for operation and maintenance management

Ranger's support for Hive is also very complete. It not only supports table-level access control, but also can be refined to field-level access control. At the same time, Ranger also supports field-level encryption and row-level filtering. These methods are very suitable for restricting the access rights of temporary users during temporary authorization.

The method of formulating Hive strategies is basically the same as that of HDFS. The access strategy we developed is shown in the figure below:

Apache Ranger: A magic weapon for operation and maintenance management

On the Masking tab, you can formulate field-level encryption strategies. As shown below:

Apache Ranger: A magic weapon for operation and maintenance management

In the above figure, the strategy we have formulated is to prevent damp users who belong to the group from seeing the real data of the lname field in the customer table of the foodmart database. The data of this field will be displayed to damp users in the form of hash value. When damp accesses this table, the results of his query are shown in the following figure:

Apache Ranger: A magic weapon for operation and maintenance management

In the Row Level Filter tab, we can formulate row-level filtering strategies. We have developed a simple strategy: Do not allow damp users to see that the customer table fname is a record of Sheri, as shown in the following figure:
Apache Ranger: A magic weapon for operation and maintenance management

At this time, the damp user cannot see the record of the customer table fname='Sheri':

Apache Ranger: A magic weapon for operation and maintenance management
It should be noted that Ranger synchronizes the users of the operating system to the hive component through the Usersync component. Therefore, when configuring the policy of the Hive component, you need to log in to the Ranger Admin with the operating system account corresponding to the hive administrator to successfully configure it. Otherwise it will prompt:
Apache Ranger: A magic weapon for operation and maintenance management

(4) Batch operation of Ranger strategy

With the acceleration and deepening of the informatization process, the application scope of the information system is gradually expanded, the number of users of the information system is also increasing, and the demand for user permissions is increasing. In addition, big data thinking has gradually become popular, and the demand for temporary access to data has also increased. If you use traditional manual methods to add security policies one by one, it will cause a huge waste of manpower and errors are prone to occur. In response to this scenario, we use the java program to add, delete, modify, and check the Ranger security policy in batches, which greatly improves the operation and maintenance efficiency.

We first implement a basic class to access Ranger Api, the code is as follows:

  public ApiResult execRangerApi(String url, String method, String requestBody) {
    HadoopConfig.Ranger ranger = this.hadoop.getRanger();
    String baseUrl = ranger.getApiBaseUrl();
    String user = ranger.getUser();
    String password = ranger.getPassword();
    String fullUrl = baseUrl + url;
    String auth = user + ":" + password;
    String authInfo = DatatypeConverter.printBase64Binary(auth.getBytes());
    HttpRequest request = null;
    if (method.equalsIgnoreCase("GET")) {
      request = HttpRequest.get(fullUrl);
    } else if (method.equalsIgnoreCase("POST")) {
      request = HttpRequest.post(fullUrl);
    } else if (method.equalsIgnoreCase("PUT")) {
      request = HttpRequest.put(fullUrl);
    } else if (method.equalsIgnoreCase("DELETE")) {
      request = HttpRequest.delete(fullUrl);
    } 
    ((HttpRequest)((HttpRequest)((HttpRequest)request.header("Authorization", "Basic " + authInfo))
      .header("Accept", "application/json"))
      .header("Content-Type", "application/json"))
      .header("X-XSRF-HEADER", "valid");
    if (requestBody != null && !requestBody.isEmpty())
      request.body(requestBody); 
    HttpResponse response = request.execute();
    ApiResult result = new ApiResult(this);
    result.setHttpCode(response.getStatus());
    result.setBodyRaw(response.body());
    return result;
  }

Based on this basic class, we can further design the implementation class for adding, deleting, modifying and querying Ranger strategies. The basic code is as follows:

  public void savePolicy(String policyName, List<String> paths, boolean isPathAdd, String appUser, List<PolicyAccess> accesses, boolean isReclusive) {
    ApiResult result = null;
    Policy policy = getPolicyByName(policyName);
    ......
    Gson gson = new Gson();
    if (isNewPolicy) {
      logger.info("create policy, content:" + gson.toJson(policy));
      result = execRangerApi("/public/v2/api/policy/", "POST", gson.toJson(policy));
      if (result.getHttpCode() != 200)
        throw new DMCException(String.format("create policy failed! ranger return : %d, %s", new Object[] { Integer.valueOf(result.getHttpCode()), result.getBodyRaw() })); 
      logger.info("create policy ok! " + policyName);
    } else {
      logger.info("edit policy, content:" + gson.toJson(policy));
      result = execRangerApi("/public/v2/api/policy/" + policy.getId(), "PUT", gson.toJson(policy));
      if (result.getHttpCode() != 200)
        throw new DMCException(String.format("edit policy failed! ranger return : %d, %s", new Object[] { Integer.valueOf(result.getHttpCode()), result.getBodyRaw() })); 
      logger.info("edit policy ok! " + policyName);
    } 
  }

Finally, we can pass the security policy we made to Ranger Admin through the curl command, so as to achieve batch processing of security policies. The reference command is as follows:

curl -H "Content-Type:application/json" -H "X-Token:token-name" -X POST "http://web-url&appUser=user-name" -d"[\"ranger-policy"]"

In summary, Apache Ranger provides a wealth of Hadoop components to help us better implement various security strategies. At the same time, the customization of the Ranger strategy is convenient, fast, easy to understand, and can be "burned after use", which is very suitable for temporary authorization scenarios. We have reason to believe that as the Hadoop ecosystem continues to expand, Ranger will be favored and used by more and more operation and maintenance personnel.

Reference materials:
1. http://ranger.apache.org/
2. ZTE ZTE, Ranger training materials

The original text comes from: http://dwz-9.cn/3p42b

Guess you like

Origin blog.51cto.com/mageedu/2590392