Apache Ranger, business background, status quo and requirements, introduction and comparison of big data security components, system architecture and practice, ranger admin, UserSync, plugin, permission model, permission implementation, etc.

26.2.1 Business Background
26.2.1.1 Current Status && Requirements
26.2.2 Introduction and Comparison of Big Data Security Components
26.2.2.2 Apache Sentry
26.2.2.3 Apache Ranger
26.2.3 Knox, Ranger, Kerveros, LDAP Integration
26.2.3.1 Kerberos and Ranger
26.3 Apache Ranger System Architecture and Practice
26.3.2 Component Introduction 26.3.2.1
Overall Description
26.3.2.2 Ranger Admin
26.3.2.3UserSync
26.3.2.4 Service Plugin
26.3.2.5 Ranger-SDK
26.3.3 Permission Model
26.3.4 Permission Implementation
26.3.4.1 Hdfs Implementation Principle
26.3.4.2 Hbase realization principle
26.3.4.3 Hive realization principle
26.3.4.4 Yarn realization principle

26.2 Principle and Application Practice of Apache Ranger

Reprinted blog post: https://blog.csdn.net/qq475781638/article/details/90247153

26.2.1 Business Background

The most basic of big data clusters is data and resources used for computing. They are a company’s precious wealth. We need to manage them well and open the corresponding data and resources to the corresponding users to prevent them from being stolen and destroyed. Wait, this involves big data security.

26.2.1.1 Status && Demand

The current status of our big data cluster is in a streaking state. As long as we can log in to the Linux machine, we can continue related operations on the cluster.
Therefore, cluster security is imminent for us. The main requirements are as follows:
 Support for multiple components, best to support The main components of the current company's technology stack, HDFS, HBASE, HIVE, YARN, STORM, KAFKA, etc.
 supports fine-grained access control, can reach HIVE column, HDFS directory, HBASE column, YARN queue, STORM topology, KAKFA TOPIC
 open source , The community is active, and the changes are made as small as possible in accordance with the existing cluster reform situation, and they must be in line with the trend of the industry.

26.2.2 Introduction and Comparison of Big Data Security Components

At present, there are mainly three common security solutions:

Kerberos (the most commonly used solution in the industry)
Apache Sentry (the solution selected by Cloudera, integrated in the cdh version)
Apache Ranger (the solution selected by Hortonworks, integrated in the hdp release)

26.2.2.1Kerberos

Kerberos is an identity authentication protocol based on symmetric keys. As an independent third-party identity authentication service, Kerberos can provide identity authentication functions for other services, and supports SSO (that is, after client identity authentication, multiple services can be accessed Such as HBase/HDFS, etc.).
Insert picture description here

Service Name effect
KDC Kerberos server program, used to verify each module
Client For users who need to access the service, KDC and Service will authenticate the user's identity.
Service That is, services that integrate Kerberos, such as HDFS/YARN/HBASE, etc.

The Kerberos protocol process mainly has three stages. In the first stage, the Client applies for a TGT from the KDC. In the second stage, the Client applies to the KDC for a ticket for accessing the Service through the obtained TGT. The third stage is that the Client uses the returned ticket to access the Service.

Advantages:
Service authentication, to prevent components such as broker datanode and regionserver from pretending to join the cluster Solving the authentication from
the server to the server and also the authentication from the client to the server

Disadvantages:
Kerberos uses temporary tickets for security, and authentication information will be invalid. Re-authentication is cumbersome when there are many users.
Kerberos can only control your access or deny access to a service, and cannot control to a very fine granularity, such as one of hdfs Path, a table of hive, does not implement user-level authentication (requires LDAP)

26.2.2.2 Apache Sentry

Apache Sentry is a Hadoop secure open source component released by Cloudera, which provides fine-grained, role-based authorization.
Insert picture description here

Advantages:
Sentry supports fine-grained hdfs metadata access control, and supports column-level access control to hive.
Sentry simplifies management through role-based authorization, granting different privilege levels to the same data set to multiple roles
Sentry provides A unified platform is convenient for management
Sentry supports integrated Kerberos
Disadvantages: The
component only supports hive, hdfs, impala does not support hbase, yarn, kafka, storm, etc.

26.2.2.3 Apache Ranger

Apache Ranger is a Hadoop security component open source component released by Hortonworks.
Advantages:
 Provides a fine-grained level (hive column level)
 Access policy-based permission model
 Access control plug-in type, unified and convenient policy management
 Support audit log, can Record audit logs of various operations, provide a unified query interface and interface
 Rich component support (HDFS, HBASE, HIVE, YARN, KAFKA, STORM)
 Support and kerberos integration
 Provide a Rest interface for secondary development

26.2.3 Knox, ranger, Kerveros, LDAP integration

Insert picture description here

26.2.3.1 Kerberos and Ranger

Kerberos and Ranger are both parts of the Hadoop security system, but their division of labor is different. You can contact us for the process of going to work every day and then sitting in the office
Insert picture description here

Before a person enters the company, the security guard needs to check whether it has a work card. If there is no work card, the security GG will think that you are not a company employee and will not allow you to enter the company park. When you enter the company park, at this time you think of a warehouse to take a look, the warehouse gate will check whether you have access control, and only if there is an access control will you enter the warehouse.
We can compare the company park to a big data cluster platform. The work card is equivalent to your Kerberos authorization authentication, and the warehouse corresponds to the specific hdfs file path.

26.3 Apache Ranger System Architecture and Practice

26.3.1 Introduction to Architecture

Insert picture description here

26.3.2 Introduction to Components

26.3.2.1 Overall description

Ranger is composed of three components: Ranger Admin, Ranger Usersync and Ranger Plugin. Their relationship is as follows:
Insert picture description here
There are instructions for these three components on Ranger's official website:

Component name Description
Admin Ranger Admin Portal is the central interface for security management. Users can create and update policies, which are stored in the policy database. The Plugins in each component will poll these strategies periodically. Portal also includes an audit server, which sends audit data collected from plugins for storage in HDFS or relational databases.
UserSync Synchronization utility to pull users and groups from Unix or LDAP or Active Directory. User or group information is stored in the Ranger portal and used for policy definition.
Plugin Plug-ins are lightweight Java programs that are embedded in each cluster component process. For example, the Apache Ranger plugin for Apache Hive is embedded in Hiveserver2. These plugins extract policies from the central server and store them locally in a file. When a user requests to pass through the component, these plug-ins intercept the request and evaluate it according to the security policy. The plug-in can also collect data from user requests and send this data back to the audit server in a separate thread.

26.3.2.2 Ranger Admin

Ranger admin consists of three parts: Web page, Rest message processing service and database. We can think of it as a centralized storage center for data (all data are stored here, but the other two components can also run separately). Its specific role:

  1. Receive user and group information from the UserSync process. And save them to the MySql database. Note: These user information needs to be used when configuring permission policies. This is easy to understand. Just like when receiving materials, the administrator needs to register the material receiver. If you are not from the company, you will definitely not be allowed to claim it.
  2. Provide an interface for creating a policy strategy
  3. Provide a processing interface for external REST messages

26.3.2.3UserSync

UserSync is a user information synchronization interface provided by Ranger, which can be used to synchronize Linux user information with LDAP user information. Through the configuration item: SYNC_SOURCE = LDAP/UNIX to confirm whether it is LDAP or UNIX, the default is to synchronize Unix information. There are three points to explain for UserSync:
1. This synchronization is one-way.
That is to say, it just reads the local user information from Unix and uploads it to Ranger Admin.
2. UserSync is not synchronized at the same time.
If the user creates a new user, but the user cannot be synchronized to Ranger immediately.
3. Does not support deleting users
Ranger does not support deleting users directly through synchronization or code.

26.3.2.4 Service Plugin

Embedded in the execution process of each system, periodically pull the strategy from RangerAdmin, execute the access decision tree according to the strategy, and record the access audit

Plug-in name Install node
Hdfs-Plugin NameNode
Hbase-Plugin HMaster+HRegionServer
Hive-Plugin HiveServer2
Yarn-Plugin ResourceManager

26.3.2.5 Ranger-SDK

Docking open platform to realize the management of users, groups and policies

26.3.3 Permission Model

Access permissions are nothing more than defining the relationship between "user-resource-permissions". Ranger abstracts this relationship based on policies, and then extends its own permission model. The meaning of "user-resource-authority" is explained in detail:
User
is expressed by User or Group. User represents the user who accesses the resource, and Group represents the user group to which the user belongs.
Resources
The business resources corresponding to different components are different, such as:
HDFS FilePath
HBase Table, Column-family, Column
Hive Database, Table, Column
Yarn correspond to Queue
 Permissions are
from (AllowACL, DenyACL) Express, similar to the whitelist and blacklist mechanisms, AllowACL is used to describe the situation of allowed access, DenyACL is used to describe the situation of denied access, and the corresponding permissions of different components are also different.

Plug-in Permission item
Hdfs Read Write Execute
Hbase Read Write Create Admin
Hive Select Create Update Drop Alter Index Lock Read Write All
Yarn submit-app admin-queue

26.3.4 Permission Implementation

Ranger-Admin Responsibilities:
The administrator plans each service strategy, allocates corresponding resources to the corresponding users or groups, and stores them in the db.
Service Plugin responsibilities:
Pull strategies from RangerAdmin regularly.
 Execute the access decision tree according to the strategy.
Real-time record access audit

Policy execution process:
Insert picture description here
Policy priority:
Blacklist priority is higher than whitelist
Blacklist exclusion priority is higher than blacklist
Whitelist exclusion priority is higher than whitelist
Decision decentralization:
If there is no policy to make access, it is generally considered There is no permission to deny access, but Ranger can also choose to delegate the decision to the system's own access control layer.

The principle of forming an integrated plug-in:

Service Extensible Interface Ranger Implement Class
HDFS org.apache.hadoop.hdfs.server.namenode.INodeAttributeProvider org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer
HBASE org.apache.hadoop.hbase.protobuf.generated.AccessControlProtos.AccessControlService.Interface org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor
Hive org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerFactory org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory
YARN org.apache.hadoop.yarn.security.YarnAuthorizationProvider org.apache.ranger.authorization.yarn.authorizer.RangerYarnAuthorizer

Ranger performs authorization verification by implementing the extended authorization interface of each component

26.3.4.1 Hdfs realization principle

hdfs-site.xml will modify the following configuration:

<property>
    <name>dfs.permissions.enabled</name>
    <value>true</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>true</value>
</property>
<property>
    <name>dfs.namenode.inode.attributes.provider.class</name>
    <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>

Loading process:
Insert picture description here

26.3.4.2 Hbase implementation principle

After installing the hbase plugin, hbase-site.xml will modify the following configuration:

<property>
    <name>hbase.security.authorization</name>
    <value>true</value>
</property>
<property>
    <name>hbase.coprocessor.master.classes</name>
   <value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
</property>
<property>
    <name>hbase.coprocessor.region.classes</name>
    <value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
</property>

Loading process:
Insert picture description here

26.3.4.3 Hive Implementation Principle

hiveserver2-site.xml

<property>
    <name>hive.security.authorization.enabled</name>
    <value>true</value>
</property>
<property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory</value>
</property>

Loading process:
Insert picture description here

26.3.4.4 Yarn implementation principle

yarn-site.xml

<property>
    <name>yarn.acl.enable</name>
    <value>true</value>
</property>
<property>
    <name>yarn.authorization-provider</name>
    <value>org.apache.ranger.authorization.yarn.authorizer.RangerYarnAuthorizer</value>
</property>

Loading process:
Insert picture description here

Guess you like

Origin blog.csdn.net/toto1297488504/article/details/106505018